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(57) The present invention provides DNA molecules 
that constitute fragments of the genome of a plant, and 
polypeptides encoded thereby. The DNA molecules are 
useful for specifying a gene product in cells, either as a 
promoter or as a protein coding sequence or as an UTR 
or as a 3' termination sequence, and are also useful in 
controlling the behavior of a gene in the chromosome, 


in controlling the expression of a gene or as tools for 
genetic mapping, recognizing or isolating identical or re- 
lated DNA fragments, or identification of a particular in- 
dividual organism, or for clustering of a group of organ- 
isms with a common trait. 

°Arabidopsis DNA is used in the present experi- 
ment, but the procedure is a general one. 
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sequence. Examples of constructs include ribozymes comprising RNA encoded by an SDF or by a sequence comple- 
mentary thereto, anttsense constructs, constructs comprising coding regions or parts thereof, constructs comprising 
promot rs, introns, untranslated regions, scaffold attachment r gions, methylating regions, enhancing or reducing re- 
gions, DNA and chromatin conformation modifying sequences, etc. Such constructs can be constructed using viral, 
plasmid, bacterial artificial chromosomes (BACs), plasmid artificial chromosomes (PACs), autonomous plant plasmids, 
plant artificial chromosomes or other types of vectors and exist in the plant as autonomous replicating sequences or 
as DNA integrated into the genome. When inserted into a host cell the construct is, preferably, functionally integrated 
with, or operatively linked to, a heterologous polynucleotide. For instance, a coding region from an SDF might be 
operably linked to a promoter that is functional in a plant. 

[0010] The present invention also resides in host cells, including bacterial or yeast cells or plant cells, and plants 
that harbor constructs such as described above. Another aspect of the invention relates to methods for modulating 
expression of specific genes in plants by expression of the coding sequence of the constructs; by regulation of expres- 
sion of one or more endogenous genes in a plant or by suppression of expression of the polynucleotides of the invention 
in a plant. Methods of modulation of gene expression includo without limitation (1) inserting into a host coll additional 
copies of a polynucleotide comprising a coding sequence; (2) modulating an endogenous promoter in a host cell; (3) 
inserting antisense or ribozyme constructs into a host cell and (4) inserting into a host cell a polynucleotide comprising 
a sequence encoding a" variant, fragment, or fusion of the native polypeptides of the instant invention. 

BRIEF DESCRIPTION OF THE TABLES 

[0011] The sequences of exemplary SDFs and polypeptides corresponding to the coding sequences of the instant 
invention are described in Reference Tables 1 and 2, REF Tables 1 and 2"; and in Sequence Tables 1 and 2, SEQ 
Tables 1 and 2." The REF Tables refer to a number of Maximum Length Sequences' or MLS." Each MLS corresponds 
to the longest cDNA obtained, either by cloning or by the prediction from genomic sequence. The sequence of the 
MLS is the cDNA sequence as described in the Av subsection of the REF Tables. 
[0012] The REF Table includes the following information relating to each MLS: 

I. cDNA Sequence 

A. 5* UTR 

B. Coding Sequence 

C. 3' UTR 

II. Genomic Sequence 

A. Exons 

B. Introns 

C. Promoters 

III. Link of cDNA Sequences to Clone IDs 

IV. Multiple Transcription Start Sites 

V. Polypeptide Sequences 

A. Signal Peptide 

B. Domains 

C. Related Polypeptides 

VI. Related Polynucleotide Sequences 
I. cDNA SEQUENCE 

[0013] The REF Tables indicate which sequence in the SEQ Tables represents the sequence of each MLS. The MLS 
sequence can comprise 5' and 3' UTR as well as coding sequences. In addition, specific cDNA clone numbers also 
are included in the REF Tables when the MLS sequence relates to a specific cDNA clone. 

A. 5' UTR I 
[001 4] Th location of the 5' UTR can be determined by comparing the most 5' MLS sequence with the corresponding 
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oonomic 6oquonco fl6 indicated in the REF Tables. Tho sequence that matches, beginning at any of the transcriptional 
start sites and ending at tho last nucleotide before any of the translational start sites corresponds to the 5' UTR. 

I 

B. Coding Region 

[0015] The coding region is the sequence in any open reading frame found in the MLS. Coding regions of interest 
are indicated in the Poly P SEQ subsection of the REF Tables. 

C. 3' UTR 

[001 6] The location of the 3' UTR can be determined by comparing the most 3' MLS sequence with the corresponding 
genomic sequence as indicated in the REF Tables. The sequence that matches, beginning at the translational stop 
site and ending at the last nucleotide of the MLS corresponds to the 3' UTR. 

is ||. GENOMIC SEQUENCE 

[0017] Further, the REF Tables indicate the specific gi M number of the genomic sequence if the sequence resides in 
a public databank. For each genomic sequence, the REF Tables indicate which regions are included in the MLS. These 
regions can include the 5' and 3' UTRs as well as the coding sequence of the MLS. See, for example, the scheme below: 

20 


to 


25 


30 


Region 1 Region 2 Region 3 

• UTR | Ex on I 1 Exon~ |- — 1 Exon I 3' UTR I- 


| - I I 

Promoter I lntron Intron I 

Translational Stop Codon 

Start Site 


[001 8] The REF Tables report the first and last base of each region that are included in an MLS sequence. An example 
is shown below: 

gi No. 47000; 
35 37102 ... 374g7 

37593... 37g25 

The numbers indicate that the MLS contains the following sequences from two regions of gi No. 47000; a first region 
including bases 37102-37497, and a second region including bases 37593-37925. 

40 A. EXON SEQUENCES 

[0019] The location of the exons can be determined by comparing the sequence ol the regions from the genomic 
soquonces with the corresponding MLS sequence as indicatod by the REF Tables. 

45 i. INITIAL EXON 

[0020] To determine the location of the initial exon, information from the 

(1) polypeptide sequence section; 
so (2) cDNA polynucleotide section: and 

(3) the genomic sequence section 

ot the REF Tables are used. First, the polypeptide section will indicate where the translational start site is located in 
tho MLS coquonco. Tho MLS soquonco can bo matchod to tho gonomic sequence that corresponds to the MLS. Based 
55 on the match between the MLS and corresponding genomic sequences , the location of the translational start site can 
be determined in one of the regions of the genomic sequence. The location of this translational start site is the start of 
the first exon. 

[0021] Generally, the last base of the exon of the corresponding genomic region, in which the translational start site 
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Ihat corresponds to iho MLS f 

[0030] To determine Ihe location ol the transcription start sites with Iho negative numbore, the ; MLS soquonco is 
nlignod with tho corresponding genomic soquonco. In tho instances whon a public genomic sequence is referenc d, 
the relevant corresponding genomic sequence can be found by direct reference to the nucleotide sequence indicated 
5 by the gi" number shown in the public genomic DN A section of the REF tables. When the position is a negative number, 
tho transcription start site is located in the corresponding genomic sequence upstream of the base that matches the 
beginning of the MLS sequence in the alignment. The negative number is relative to the first base of the MLS sequence 
which matches the genomic sequence corresponding to the relevant gi* number. 

[0031] In tho innUincoo whon no public genomic DNA is rnloroncod, tho rolovant nucleotide soquonco. foralignment 
70 is the nucleotide sequence associated with the amino acid sequence designated by gi" number of the later PolyP SEQ 
subsection. 

V. Polypeptide Sequences 

is [0032] The PolyP SEQ subsection lists SEQ ID NOs and Ceres SEQ ID NO for polypeptide sequences corresponding 
to the coding sequence of the MLS sequence and the location of the translational start site with the coding sequence 
of the MLS sequence. - 

[0033] The MLS sequence can have multiple translational start sites and can be capable ot producing more than 
ono polypoptido soquonco. 

po 

A. Signal Peptide 

[0034] The REF Tables also indicate in subsection (B) the cleavage site ot the putative signal peptide of the polypep- 
tide corresponding to the coding sequence of the MLS sequence. Typically, signal peptide coding sequences comprise 
25 a soquonco encoding tho first residue ot tho polypeptide to the cleavage site residue. 

B. Domains 

[0035] Subsection (C) provides information regarding identified domains (where present) within the polypeptide and 
30 (where present) a name for the polypeptide domain. 

C. Related Polypeptides 

[0036] Subsection (Dp) provides (where present) information concerning amino acid sequences that are lound to be 
35 related and have some percentage of sequence identity to the polypeptide sequences of REF and SEQ TABLES 1 
AND 2. These related sequences are identified by a gi" number. 

Vi. Related Polynucleotide Sequences 

40 [0037] Subsection (Dn) provides polynucleotide sequences (where present) that are related to and have some per- 
centage of sequence identity to the MLS or corresponding genomic sequence. 


Abbreviation 

Description 

Max Len. Seq. 

Maximum Length Sequence 

re I to 

Related to 

Clone Ids 

Clone ID numbers 

Pub gDNA 

Public Genomic DNA 

gi No. 

gi number 

Gen. seq. in cDNA 

Genomic Sequence in cDNA (Each region for a single gene prediction is 
listed on a separate line. 


In the case of multiple gene predictions, the group of regions relating to a 
single prediction are separated by a blank line) 

(Ac) cDNA SEQ 

cDNA sequence 
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(continued) 



Abbreviation 

Description 

5 

- Pat. Appln. SEQ ID NO 

Patent Application SEQ ID NO: . 


- Ceres SEQ ID NO: 1673877 

Ceres SEQ ID NO: 


- SEQ # w. TSS 

Location within the cDNA sequence, SEQ ID NO:, of Transcription Start Sites 
which are listed below 

10 

- Clone ID #:#-># 

Clone ID comprises bases # to # of the cDNA Sequence 


PolyP SEQ 

Polypeptide Sequence 


- Pat. Appln. SEQ ID NO: 

Patent Application SEQ ID NO: . 


- Ceres SEQ ID NO 

Ceres SEQ ID NO: 

15 

- Loc. SEQ ID NO: @ nt. 

Location of translational start site in cDNA of SEQ ID NO: at nucleotide 
number 


(C) Pred. PP Nom, & Annot. 

Nomination and Annotation of Domains within Predicted Polypeptide(s) 

20 

-(Title) 

Name of Domain 

- Loc. SEQ ID NO #:#-># aa. 

Location of the domain within the polypeptide of SEQ ID NO: from # to # 
amino acid residues. 


(Dp) Rel. AA SEC 

Related Amino Acid Sequences 

25 

- Align. NO 

Alignment number 


- gi No 

Gi number 


- Desp. 

Description 


-%ldnt. 

Percent identity 

30 

- Align. Len. 

Alignment Length 


- Loc. SEQ ID NO: # -> # aa 

Location within SEQ ID NO: from # to # amino acid residue. 


DETAILED DESCRIPTION OF THE INVENTION 

35 

[0038] The invention relates to (I) polynucleotides and methods of use thereof, such as 

IA. Probes, Primers and Substrates; 

IB. Methods ot Detection and Isolation; 

40 

B.1. Hybridization; 

B.2. Methods of Mapping; 

B.3. Southern Blotting; 

B. 4. Isolating cDNA from Related Organisms; 

45 B.5. Isolating and/or Identifying Orthologous Genes 

IC. Methods of Inhibiting Gene Expression 

C. 1 . Antisense 

50 C.2. Ribozyme Constructs; 

C.3. Chimeraplasts; 
C.4 Co-Suppression; 
C.5. Transcriptional Silencing 
C.6. Other Methods to Inhibit Gene Expression 

55 

ID. Methods of Functional Analysis; 

IE. Promoter Sequences and Their Use; 
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IF. UTRs and/or Intron Soquonces and Their Use; and 

IG. Coding Soquoncos and Their Use. 

I 

[0039] The invention also relates to (II) polypeptides and proteins and methods ol use thereof, such as 

5 

- IIA. Native Polypeptides and Proteins 

A.1 Antibodios 

A. 2 In Vitro Applications 

10 

MB. Polypeptide Variants, Fragments and Fusions 

B. 1 Variants 
B.2 Fragments 

is B.3 Fusions 

[0040] The invention also includes (III) methods of modulating polypeptide production, such as 
I IIA. Suppression 

20 

A.1 Antisense 
A.2 Ribozymes 
A. 3 Co-suppression 

A.4 Insertion ol Sequences into the Gene to be Modulated 
25 A.5 Promoter Modulation 

A. 6 Expression of Genes containing Dominant-Negative Mutations 

1MB. Enhanced Expression 

30 B.1 Insertion of an Exogenous Gene 

B. 2 Promoter Modulation 

[0041] The invention further concerns (IV) gene constructs and vector construction, such as 

35 IVA. Coding Sequences 

IVB. Promoters 
IVC. Signal Peptides 

[0042] The invention still further relates to 
40 V Transformation Techniques 

Definitions 

[0043] Allelic variant An allelic variant" is an alternative form of the same SDF, which resides at the same chro- 
45 mosomal locus in the organism. Allelic variations can occur in any portion of the gene sequence, including regulatory 
regions. Allelic variants can arise by normal genetic variation in a population. Allelic variants can also be produced by 
genetic engineering methods. An allelic variant can be one that is found in a naturally occurring plant, including a 
cultivar or ocotypo. An allelic variant may or may not give rise to a phenotypic change, and may or may not be expressed. 
An allele can result in a detectable change in the phonolypo ol tho trail ropresonted by the locus, A phenotypically 
so silent allele can give rise to a product. 

[0044] Alternatively spliced messages Within the context of the current invention, alternatively spliced messag- 
es" refers to mature mRNAs originating from a single gene with variations in the number and/or identity of exons, 
iritrons and/or intron-exon junctions. 

[0045] Chimeric The term chimeric" is used to describe genes, as defined supra, or contructs wherein at least 
55 two of the elements ol the gene or construct, such as the promoter and the coding sequence and/or other regulatory 
sequences and/or filler sequences and/or complements thereof, are heterologous to each other. 
[0046] Constitutive Promoter: Promoters referred to herein as "constitutive promoters" actively promote transcription 
undermost, but not necessarily all, environmental conditions and states of development or cell differentiation. Examples 
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of constitutive promoters include the cauliflower mosaic virus (CaMV) 35S transcript initiation region and the V or 2* 
promoter derived from TDNA of Agrobacterium tumefaciens, and other transcription initiation regions from various 
plant genes, such as the maize ubiquitin-1 promoter, known to those of skill. 

[0047] Coordinately Expressed: The torm coordinately expressed," as used in the current invention,- refers to 
5 genes that are expressed at the same or a similar time and/or stage and/pr under the same or similar environmental . 
conditions. 

[0048] Domain: Domains are fingerprints or signatures that can be used to characterize protein families and/or 
parts of proteins. Such fingerprints or signatures can comprise conserved (1) primary sequence, (2) secondary struc- 
ture, and/or (3) three-dimensional conformation. Generally, each domain has been associated with either a family of 

io proteins or motif si Typically, these families and/or motifs have been correlated with specific in-vitro and/or in-vivo ac- 
tivities. A domain can be any length, including. the entirety of the sequence of a protein. Detailed descriptions of the 
domains, associated families and motifs, and correlated activities of the polypeptides of the instant invention are de- 
scribed below. Usually, the polypeptides with designated domain (s) can exhibit at least one activity that is exhibited by 
any polypeptide that comprises the same domain(s). 

is [0049] Endogenous The term endogenous," within the context of the current invention refers to any polynucle- 
otide, polypeptide or protein sequence which is a natural part of a cell or organisms regenerated from said cell, 
[0050] Exogenous Exogenous," as referred to within, is any polynucleotide, polypeptide or protein sequence, 
whether chimeric or not, that is initially or subsequently introduced into the genome of an individual host cell or the 
organism regenerated from said host cell by any means other than by a sexual cross. Examples of means by which 

20 this can be accomplished are described below, and include >4pfo6ac/er/u/n-mediated transformation (of dicots - e.g. 
Salomon et al. EMBO J. 3:141 (1984); Herrera-Estrella et al. EMBO J. 2:987 (1983); of monocots, representative 
papers are those by Escudero et al., Plant J. 20:355 (1996), Ishida et al., Nature Biotechnology 1 4:745 (1996), May 
etal., Bio/Technology}3:A&6 (1995)), biolistic methods (Armaleo etal., Current Genetics 17:97 1990)), electroporation, 
in planta techniques, and the like. Such a plant containing the exogenous nucleic acid is referred to here as a T 0 for 

25 the primary transgenic plant and T-, for the first generation. The term exogenous* as used herein is also intended to 
encompass inserting a naturally found element into a non-naturally found location. 

[0051] Filler sequence: As used herein, filler sequence" refers to any nucleotide sequence that is inserted into 
DNA construct to evoke a particular spacing between particular components such as a promoter and a coding region 
and may provide an additional attribute such as a restriction enzyme site. 

30 [0052] Gene: The term gene, - as used in the context of the current invention, encompasses all regulatory and coding 
sequence contiguously associated with a single hereditary unit with a gonotic lunction (soo SCHEMATIC 1), Gonos 
can include non-coding sequences that modulate the genetic function that include, but are not limited to, those that 
specify polyadenylatton, transcriptional regulation, DNA conformation, chromatin conformation, extent and position of 
base methylation and binding sites of proteins that control all of these. Genes comprised of exons" (coding sequences), 

35 which may be interrupted by introns" (non-coding sequences), encode proteins. A gene's genetic function may require 
only RNA expression or protein production, or may only require binding of proteins and/or nucleic acids without asso- 
ciated expression. In certain cases, genes adjacent to one another may share sequence in such a way that one gone 
will overlap the other. A gene can be found within the genome of an organism, artificial chromosome, plasmid, vector, 
etc., or as a separate isolated entity. 

40 [0053] Gene Family: Gene family" is used in the current invention to describe a group of functionally related genes, 
each of which encodes a separate protein. 

[0054] Heterologous sequences: Heterologous sequences" are those that are not operatively linked or are not 
contiguous to each other in nature. For example, a promotor from corn is considorod hotorologous to an Ambidopsis 
coding region sequence. Also, a promoter from a gene encoding a growth factor from corn is considered heterologous 

4 5 to a sequence encoding the corn receptor for the growth factor. Regulatory element sequences, such as UTRs or 3' 
end termination sequences that do not originate in nature from the same gene as the coding sequence originates from, 
are considered heterologous to said coding sequence. Elements operatively linked in nature and contiguous to each 
other are not heterologous to each other. On the other hand, these same elements remain operativley linked but become 
heterologous if other filler sequence is placed botwoon them. Thus, the promotor and coding soquoncos ol a corn gono 

50 expressing an amino acid transporter are not heterologous to each other, but the promoter and coding sequence of a 
corn gene operatively linked in a novel manner are heterologous. 

[0055] Homologous gene In the current invention, homologous gene" refers to a gene that shares sequence 
similarity with the gene of interest. This similarity may be in only a fragment of the sequence and often represents a 
functional domain such as, examples including without limitation a DNA binding domain, a domain with tyrosine kinase 
55 activity, or the like. The functional activities of homologous genes are not necessarily the same. 

[0056] Inducible Promoter An inducible promoter" in the context of the current invention refers to a prombter 
which is regulated under certain conditions, such as light, chemical concentration, protein concentration, conditions in 
an organism, cell, or organelle, etc. A typical example of an inducible promoter, which can be utilized with the polynu- 
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I.B. Methods of Detection and Isolation 

[0086] The polynucleotides of the invention can be utilized in a number of methods known to those skilled in the art 
as probes and/or primers to isolate and detect polynucleotides, including, without limitation: Southerns, Northerns, 
5 Branched DNA hybridization assays, polymerase chain reaction, arid microarray assays, and variations thereof. Spe- 
cific methods given by way of examples, and discussed below include: 

Hybridization 
Methods of Mapping 
to Southern Blotting 

Isolating cDNA from Related Organisms 
Isolating and/or Identifying Orthologous Genes. 

Also, the nucleic acid molecules of the invention can used in other methods, such as high density oligonucleotide 
15 hybridizing assays, described, for example, in U.S. Pat. Nos. 6,004,753; 5,945,306; 5,945,287; 5,945,308; 5,919,686; 
5,919,661; 5,919,627; 5,874,248; 5,871,973; 5,871,971; and 5,871,930; and PCT Pub. Nos. WO 9946380; WO 
9933981; WO 993387Q; WO 9931252; WO 9915658; WO 9906572; WO 9858052; WO 9958672; and WO 9810858. 

B.1. Hybridization 

20 

[0087] The isolated SDFs of REF and SEQ TABLES 1 AND 2 of the present invention can be used as probes and/ 
or primers for detection and/or isolation of related polynucleotide sequences through hybridization. Hybridization of 
one nucleic acid to another constitutes a physical property that defines the subject SDF of the invention and the identified 
related sequences. Also, such hybridization imposes structural limitations on the pair. A good general discussion of 
25 the factors for determining hybridization conditions is provided by Sambrook et al. ("Molecular Cloning, a Laboratory 
Manual, 2nd ed., c. 1989 by Cold Spring Harbor Laboratory Press Cold Spring Harbor, NY; see esp., chapters 11 and 
12). Additional considerations and details of the physical chemistry of hybridization are provided by G.H. Keller and 
M.M. Manak DNA Probes", 2 nd Ed. pp. 1-25, c. 1993 by Stockton Press, New York, NY 

[0088] Depending on the stringency of the conditions under which these probes and/or primers are used, polynucle- 
30 otides exhibiting a wide range of similarity to those in REF and SEQ TABLES 1 AND 2 can be detected or isolated. 
When the practitioner wishes to examine the result of membrane hybridizations under a variety of stringencies, an 
efficient way to do so is to perform the hybridization under a low stringency condition, then to wash the hybridization 
membrane under increasingly stringent conditions. 

[0089] When using SDFs to identify orthologous gonos in othor spocios, Iho practltlonor will profombly adjust Iho 
35 amount of target DNA of each species so that, as nearly as is practical, the same number of genome equivalents are 
present for each species examined. This prevents faint signals from species having large genomes, and thus small 
numbers of genome equivalents per mass of DNA, from erroneously being interpreted as absence of the corresponding 
gene in the genome. 

[0090] The probes and/or primers of the instant invention can also be used to detect or isolate nucleotides that are 
to identical" to the probes or primers. Two nucleic acid sequences or polypeptides are said to be "identical" if the sequence 
of nucleotides or amino acid residues, respectively, in the two sequences is the same when aligned for maximum 
correspondence as described below. 

[0091] Isolated polynucleotides within the scope of the invention also include allelic variants of the specific sequences 
presented in REF and SEQ TABLES 1 AND 2. The probes and/or primers of tho invontion can also bo used to dotoct 
45 and/or isolate polynucleotides exhibiting at least 80% soquonco identity with tho soquoncos of REF and SEQ TABLES 
1 AND 2 or fragments thereof. 

[0092] With respect to nucleotide sequences, degeneracy of the genetic code provides the possibility to substitute 
at least one base of the base sequence of a gene with a different base without causing the amino acid sequence of 
the polypeptide produced from the gene to be changed. Hence, the DNA of the present invention may also have any 
50 base sequence that has been changed from a sequence in REF and SEQ TABLES 1 AND 2 by substitution In accord- 
ance with degeneracy of genetic code. References describing codon usage include: Carels et al., J. Mot. Evol. 46: 45 
(1998) and Fennoy etal., Nuci Acids Res. 21(23) : 5294 (1993). 

B.2. Mapping 

55 

[0093] The isolated SDF DNA of tho invontion can bo used to create various typos of gonotlc and phyalcal inapt* of 
the genome of corn, Arabidopsis, soybean, rice, wheat, or other plants. Some SDFs may be absolutely associated 
with particular phenotypic traits, allowing construction of gross genetic maps. While not all SDFs will Immediately bo 
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multiple phenotypic traits. 

B.3 Southern Blot Hybridization 

5 [0101] The sequences from REF and SEQ TABLES 1 AND 2 can be used as probes for various hybridization tech- 
niques. These techniques are useful for detecting target polynucleotides in a sample or for determining whether trans- 
genic plants, seeds or host cells harbor a gene or sequence of interest and thus might be expected to exhibit a particular 
trait or phenotype. 

[0102] In addition, the SDFs from the invention can be used to isolate additional members of gene families Irom the 
io same or different species and/or orthologous genes from the same or different species. This is accomplished by hy- 
bridizing an SDF to, for example, a Southern blot containing the appropriate genomic DNA orcDNA. Given the resulting 
hybridization data, one of ordinary skill in the art could distinguish and isolate the correct DNA fragments by size, 
restriction sites, sequence and slated hybridization conditions from a gel or from a library. 

[0103] Identification and isolation of orthologous genes from closely related species and alleles within a species is 
1$ particularly desirable because of their potential for crop improvement. Many important crop traits, such as the solid 
content of tomatoes, result from the combined interactions of the products of several genes residing at different Joci in 
the genome. Generally; alleles at each of these loci can make quantitative differences to the trait. By identifying and 
isolating numerous alleles for each locus from within or different species, transgenic plants with various combinations 
of alleles can be created and the effects of the combinations measured. Once a more favorable allele combination has 
20 been identified, crop improvement can be accomplished either through biotechnological means or by directed conven- 
tional breeding programs (Tanksley et al. Science 277 : 1063(1 997)). 

[0104] The results from hybridizations of the SDFs of the invention to, for example, Southern blots containing DNA 
from another species can also be used to generate restriction fragment maps for the corresponding genomic regions. 
These maps provide additional information about the relative positions of restriction sites within fragments, further 

25 distinguishing mapped DNA from the remainder of the genome. 

[0105] Physical maps can be made by digesting genomic DNA with different combinations of restriction enzymes. 
[0106] Probes for Southern blotting to distinguish individual restriction fragments can range in size from 15 to 20 
nucleotides to several thousand nucleotides. More preferably, the probe is 100 to 1,000 nucleotides long for identifying 
members of a gene family when it is found that repetitive sequences would complicate the hybridization. For identifying 

30 an entire corresponding gene in another species, the probe is more preferably the length of the gene, typically 2,000 
to 10,000 nucleotides, but probes 50-1,000 nucleotides long might be used. Some genes, however, might require 
probes up to 1 ,500 nucleotides long or overlapping probes constituting the full-length sequence to span their lengths. 
[0107] Also, while it is preferred that the probe be homogeneous with respect to its sequence, it is not necessary. 
For example, as described below, a probe representing members of a gene family having diverse sequences can be 

35 generated using PCR to amplify genomic DNA or RNA templates using primers derived from SDFs that include se- 
quences that define the gene family. 

[0108] For identifying corresponding genes in another species, the next most preferable probe is a cDNA spanning 
the entire coding sequence, which allows all of the mRNA-coding fragment of the gene to bo identified. Probes tor 
Southern blotting can easily be generated from SDFs by making primers having the sequence at the ends of the SDF 

40 and using corn or Arabidopsis genomic DNA as a template. In instances where the SDF includes sequence conserved 
among species, primers including the conserved sequence can be used for PCR with genomic DNA from a species of 
interest to obtain a probe. Similarly, if the SiDF includes a domain of interest, that fragment of the SDF can be used to 
make primers and, with appropriate template DNA, used to make a probe to identify genes containing the domain. 
Alternatively, the PCR products can be resolved, for example by gel electrophoresis, and cloned and/or sequenced. 

45 Using Southern hybridization, the variants of the domain among members of a gene family, both within and across 
species, can be examined. 

B.4.1 Isolating DNA from Related Organisms 

so [0109] The SDFs of the invention can be used to isolate the corresponding DNA from other organisms. Either cDNA 
or genomic DNA can be isolated. For isolating genomic DNA, a lambda, cosmid, BAC or YAC, or other large insert 
genomic library from the plant of interest can be constructed using standard molecular biology techniques as described 
in detail by Sambrook et al. 1989 (Molecular Cloning: A Laboratory Manual, 2 nd ed. Cold Spring Harbor Laboratory 
Press, New York) and by Ausubel et al. 1992 (Current Protocols in Molecular Biology, Greene Publishing, New York). 

55 [0110] To screen a phage library, for example, recombinant lambda clones aro platod out on appropriate bacterial 
medium using an appropriate E. col/host strain. The resulting plaques are lifted from the plates using nylon or nitro- 
cellulose filters. The plaque lifts are processed through denaturation, neutralization, and washing treatments following 
the standard protocols outlined by Ausubel et al. (1992). The plaque lifts are hybridized to either radioactively labeled 
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or norvradioactively labeled SDF DNA at room temperature for about 16 hours, usually in th presence ol 50% 1orma- 
mide and 5X SSC (sodium chloride and sodium citrate) buffer and blocking reagents. The plaque lifts are then washed 
at 42°C with 1% Sodium Dodecyl Sulfate (SDS) and al a particular concentration of SSC. The SSC concentration used 
in dependent upon Iho otringoncy nt which hybridization occurred in the initial Southern blot analysis performed. For 
oxMinpto, il h Imcjmonl hybridized undor medium stringency (o.g., Tm - 20°C). thon this condition is maintained or 
preferably adjusted to a less stringent condition (e.g., Tm^C) to wash the plaque lilts. Positive clones show detect- 
able hybridization e g., by exposure to X-ray films or chromogen formation. The positive clones are then subsequently 
isolated for purification using the same general protocol outlined above. Once the clone is purified, restriction analysis 
can be conducted to narrow the region corresponding to the gene ot interest. The restriction analysis and succeeding 

to mibcloning ntops cfin bo dono using procoduros described by, for example Sambrook et al. (1989) cited above. 

[01 1 1] The procedures outlined tor the lambda library are ossontially similar to those used tor YAC library screening, 
except that the YAC clones are harbored in bacterial colonies. The YAC clones are plated out at reasonable density 
on nilrocolluloso or nylon fillors supported by appropriate bacterial medium in petri plates. Following the growth of the 
bacterial clones, the filters are processed through the denaturation, neutralization, and washing steps following the 

is procedures of Ausubel et al. 1992. The same hybridization procedures for lambda library screening are followed. 

10112] To inolnlo cDNA, similar procoduros using appropriately modified vectors are employed. For instance, the 
library can bo conslructod In a lambda vector appropriate tor cloning cDNA ouch go \gt11. Alternatively, tho cDNA 
library can be made in a plasmid vector. cDNA for cloning can be prepared by any of tho methods known in the art, 
but ic piolorably piopfirod as doecribod abovo. Prolorably, a cDNA library will include a high proportion of full-length 

20 clones. 

B, 5. Isolating and/or Identifying Ortholoqous Genes 

[011 3] Probes and primers of the invention can be used to identify and/or isolate polynucleotides related to those in 

& REF and SEQ TABLES 1 AND 2. Rolalod polynucleotides are those that are native to other plant organisms and exhibit 
either similar sequence or encode polypeptides with similar biological activity. One specific example is an orthologous 
gene Orthologous genes have the same functional activity. As such, orthologous genes may be distinguished from 
homologous genes. Tho percentage of identity is a function ot evolutionary separation and, in closely related species, 
the percentage of identity can be 98 to 100%. The amino acid sequence of a protein encoded by an orthologous gene 

30 can be less than 75% identical, but tends to be at Ieast75% or at least 80% identical, more preferably at least 90%, 
most proforably at loast 95% idontical to the amino acid sequence of the reference protein. To find orthologous genes, 
the probes are hybridized to nucleic acids from a species of interest under low stringency conditions, preferably one 
whore sequences containing as much as 40-45% mismatches will be able to hybridize. This condition is established 
by T - 40°C to T m - 48°C (see below). Blots are then washed under conditions o1 increasing stringency It is preferable 

35 that the wash stringency be such that sequences that are 85 to 1 00% identical will hybridize. More preferably, sequences 
90 to 100% identical will hybridize and most preferably only sequences greater than 95% identical will hybridize. One 
of ordinary skill in the art will recognize that, due to degeneracy in the genetic code, amino acid sequences that are 
idenlical can be encoded by DNA sequences as little as 67% identical or less. Thus, it is preferable, for example, to 
make an overlapping series of shorter probes, on the order of 24 to 45 nucleotides, and individually hybridize them to 

40 the same arrayed library to avoid the problem ot degeneracy introducing large numbers of mismatches. 

[0114] As evolutionary divergence increases, genome sequences also tend to diverge. Thus, one of skill will recog- 
nize that searches for orthologous genes between more divergent species will require the use of lower stringency 
conditions compared to searches between closely related species. Also, degeneracy of the genetic code is more of a 
problem for searches in the genome of a species more distant evolutionarily from the species that is the source of the 

46 SDF piobo ooquoncoo. 

[0115] The SDFs of tho invention can also bo used as probes to search for genes that are related to the SDF within 
a species. Such related genes are typically considered to be members of a gene lamily In such a case, the sequence 
similarity will often be concentrated into one or a few fragments of the sequence. The fragments of similar sequence 
that define the gene family typically encode a fragment of a protein or RNA that has an enzymatic or structural function. 

so The percentage of identity in the amino acid sequence of the domain that defines the gene family is preferably at least 
70%, moro prolorably 80 to 95%, most protorabty 85 to 99%. To search lor members ot a gene family within a species, 
a low stringency hibridization is usually perlormed, but this will depend upon the size, distribution and degree of se- 
quence divergence of domains that define the gene family. SDFs encompassing regulatory regions can be used to 
identify coordinately expressed genes by using the regulatory region sequence of the SDF as a probe. 

55 [0116] In the instances where the SDFs are identified as being expressed from genes that confer a particular phe- 
nolypo. thon tho SDFs can also be used as probes to assay plants of different species for those phenotypes. 
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I.C. Methods to Inhibit Gene Expression 

[01 1 7] The nucleic acid molecules of the present invention can be used to inhibit gene transcription and/or translation. 
Example of such methods include, without limitation: 

5 

Antisense Constructs; 
Ribozyme Constructs; 
Chimeraplast Constructs; 
Co-Suppression; 
10 Transcriptional Silencing; and 

Other Methods of Gene Expression. 

C.1 Antisense 

is [011 8] In some instances it is desirable to suppress expression of an endogenous or exogenous gene. A well-known 
instance is the FLAVOR- SAVOR" tomato, in which the gene encoding ACC synthase is inactivated by an antisense 
approach, thus delaying softening of the fruit after ripening. See lor example, U.S. Patent No. 5,859,330; U.S. Patent 
No. 5,723,766; Oeller, et al, Science, 254:437-439(1991); and Hamilton etal, Nature, 346:284-287 (1990). Also, timing 
of flowering can be controlled by suppression of the FLOWERING LOCUS C (FLC)] high levels of. this transcript are 

20 associated with late flowering, while absence of FLC is associated with early flowering (S.D. Michaels et al., Plant Cell 
21:949 (1999). Also, the transition of apical meristem from production of leaves with associated shoots to (lowering is 
regulated by TERMINAL FLOWER 1, APETALA1 and LEAFY. Thus, when it is desired to induce a transition from shoot 
production to flowering, it is desirable to suppress TFL1 expression (S.J. Liljegren, Plant Cell VVA0Q7 (1999)). As 
another instance, arrested ovule development and female sterility result from suppression of the ethylene forming 

25 enzyme but can be reversed by application of ethylene (D. De Martinis et al., Plant Cell XV 1061 (1 999)). The ability 
to manipulate female fertility of plants is useful in increasing fruit production and creating hybrids. 
[0119] In the case of polynucleotides used to inhibit expression of an endogenous gene, the introduced sequence 
need not be perfectly identical to a sequence of the target endogenous gene. The introduced polynucleotide sequence 
will typically be at least substantially identical to the target endogenous sequence. 

30 [0120] Some polynucleotide SDFs in REF and SEQ TABLES 1 AND 2 represent sequences that are expressed in 
corn, wheat, rice, soybean Arabidopsis and/or other plants. Thus the invention includes using these sequences to gen- 
erate antisense constructs to inhibit translation and/or degradation of transcripts of said SDFs, typically in a plant cell. 
[0121] To accomplish this, a polynucleotide segment from the desired gene that can hybridize tothe mRNA expressed 
from the desired gene (the antisense segment") is operably linked to a promoter such that the antisense strand of RNA 

35 will be transcribed when the construct is present in a host cell. A regulated promoter can be used in the construct to 
control transcription of the anlisense segment so that transcription occurs only under desired circumstances. 
[0122] The antisense segment to be introduced generally will be substantially identical to at least a fragment of the 
endogenous gene or genes to be repressed. The sequence, however, need not be perfectly identical to inhibit expres- 
sion. Further, the antisense product may hybridize to the untranslated region instead of or in addition to the coding 
sequence of the gene. The vectors of the present invention can be designed such that the inhibitory effect applies to 
other proteins within a family of genes exhibiting homology or substantial homology to the target gene. 
[0123] For antisense suppression, the introduced antisense segment sequence also need not be full length relative 
to either the primary transcription product or the fully processed mRNA. Generally, a higher percentage of sequence 
identity can be used to componsato for tho use of a ehortor soquonco. Furthormoro, Iho inlroducod uoquonco nood 

^5 not have the same intron or exon pattern, and homology of noncoding segments may be equally effective. Normally, 
a sequence of between about 30 or 40 nucleotides and the full length of the transcript can be used, though a sequence 
of at least about 100 nucleotides is preferred, a sequence of at least about 200 nucleotides is more preferred, and a 
sequence of at least about 500 nucleotides is especially preferred. 

50 C.2. Ribozvmos 

[0124] It is also contemplated that gene constructs representing ribozymes and based on the SDFs in REF AND 
SEQ TABLES 1 AND 2 are an object of the invention. Ribozymes can also be used to inhibit expression of genes by 
suppressing the translation of the mRNA into a polypeptide. It is possible to design ribozymes that specifically pair with 
55 virtually any target RNA and cleave the phosphodiester backbone at a specific location, thereby functionally inactivating 
tho target RNA. In carrying out this cleavage, Iho rlbozymo Is not Itaoll altered, and Is thue capable ol recycling bnd 
cleaving other molecules, making it a true enzyme. The inclusion of ribozyme sequences within ant isonso RN As confers 
RNAcleaving activity upon them, thereby Increasing the activity of the constructs. 
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[0125] A number of classes of ribozymes have been identified. One class of ribozymes is derived from a number of 
small circular RNAs, which are capable of selfcleavage and replication in plants, the RNAs replicate either alone (viroid 
RNAs) or 1 with a helper virus (satellite RNAs). Examples include RNAs from avocado sunblotch viroid and the satellite 
RNAc from tobacco rinqspol virus, lucorno transiont slroak virus, velvet tobacco mottlo virus, solanum nodiflorum mottl 
5 • virus and subterranean clover mottle virus. The design and use of target RNAspecific ribozymes is described in Haseloff 
efal. Nature, 334:585 (1988). 

[0126] Like tho anlisonse constructs above, the ribozyme sequence fragment necessary for pairing need not be 
idontical to tho target nucleotides to bo cloavod, nor identical to tho sequences in REF AND SEQ TABLES 1 AND 2. 
Ribozymes may be constructed by combining the ribozyme sequence and some fragment of the target gene which 

w would allow recognition ol tho target gone mRNA by the resulting ribozyme molecule. Generally, the sequence in the 
ribozyme capable of binding to the target sequence exhibits a percentage of sequence identity with at least 80%, 
preferably with at least 85%, more preferably with at least 90% and most preferably with at least 95%, even more 
pfolorably, with at least 96%, 97%, 98% or 99% sequence identity to some fragment of a sequence in REF AND SEQ 
TABLES 1 AND 2 or the complement thereof. The ribozyme can be equally effective in inhibiting mRNA translation by 

75 cleaving either in the untranslated or coding regions. Generally, a higher percentage of sequence identity can be used 
to componsato for the use of a shorter sequence. Furthermore, the introduced sequence need not have the same 
inlron or exon pattern, and homology of non-coding segments may be equally effective. 
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30 


C.3. Chimoraplasts 


[01 27] The SDFs of the invention, such as those described by the REF and SEQ Tables, can also be used to construct 
chimoraplasts that can be introduced into a cell to produce at least one specilic nucleotide change in a sequence 
corresponding to the SDF of the invention. A chimeraplast is an oligonucleotide comprising DNA and/or RNA that 
specifically hybridizes to a target region in a manner which creates a mismatched base^air. This mismatched base- 
25 pair signals the cell's repair enzyme machinery which acts on the mismatched region resulting in the replacement, 
insertion or deletion of designated nucleotide(s). The altered sequence is then expressed by the cell's normal cellular 
mechanisms Chimoraplasts can bo designed to repair mutant genes, modify genes, introduce site-specific mutations, 
nnd/oi net to intunupl oi nltor nomuil guno lunclion (US Pal. Nob. 6,010,907 and 6,004.804; and PCT Pub. No. 
WQ99/58723 and WO99/07865). 


C.4. Sense Suppression 


[0128] The SDFs of REF and SEQ TABLES 1 AND 2 ot the present invention are also useful to modulate gene 
oxprossion by sonso suppression. Sense suppression represents another method of gone suppression by introducing 

35 at least one exogenous copy or fragment of the endogenous sequence to be suppressed. 

[0129] Introduction of expression cassettes in which a nucleic acid is configured in the sense orientation with respect 
to the promoter into the chromosome of a plant or by a self-replicating virus has been shown to be an effective means 
by which to induce degradation of mRNAs of target genes. For an example of the use of this method to modulate 
oxprossion ol ondogonous gonos soo. Napoli ot a!., Tho Plant Coll 2:279 (1990), and U.S. Patents Nos. 5.034,323, 

40 5,231,020, and 5,283,184. Inhibition ot expression may require some transcription of the introduced sequence. 

[0130] For sense suppression, the introduced sequence generally will be substantially identical to the endogenous 
sequence intended to be inactivated. The minimal percentage of sequence identity will typically be greater than about 
65% but a higher percentage of sequence identity might exert a more effective reduction in the level of normal gene 
products Sequence identity of more than about 80% is preferred, though about 95% to absolute identity would be 

45 most preferred. As with antisonso regulation, tho effect would likely apply to any other proteins within a similar family 
of genes exhibiting homology or substantial homology to the suppressing sequence. 

C.5. Transcriptional Silencing 

so [0131] The nucleic acid sequences of the invention, including the SDFs of REF and SEQ TABLES 1 AND 2, and 
fragments thereof, contain sequences that can be inserted into the genome of an organism resulting in transcriptional 
silencing. Such regulatory sequences need not be operatively linked to coding sequences to modulate transcription of 
a gene. Specifically, a promoter sequence without any other element of a gene can be introduced into a genome to 
transcriptionally silence an endogenous gene (see, for example, Vaucheret, H et al. (1998) The Plant Journal 16: 

55 651-659). As another example, triple helices can be formed using oligonucleotides based on sequences from REF 
AND SEQ TABLES t AND 2, fragments thereof, and substantially similar sequence thereto. The oligonucleotide can 
be delivered to the host cell and can bind to the promoter in the genome to form a triple helix and prevent transcription. 
An oligonucleotide of interest is one that can bind to the promoter and block binding of a transcription factor to the 
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promoter. In such a case, the oligonucleotide can be complementary to the sequences of the promoter that interact 
with transcription binding factors. 

C.6. Other Methods to Inhibit Gene Expression 

s 

[0132] Yet another means of suppressing gene expression is to insert a polynucleotide into the gene of interest to 
disrupt transcription or translation of the gene. 

[01 33] Low frequency homologous recombination can be used to target a polynucleotide insert to a gene by flanking 
the polynucleotide insert with sequences that are substantially similar to the gene to be disrupted. Sequences from 
to REF AND SEQ TABLES 1 AND 2, fragments thereof, and substantially similar sequence thereto can be used for 
homologous recombination. 

[01 34] In addition, random insertion of polynucleotides into a host cell genome can also be used to disrupt the gene 
of interest. Azpiroz-Leehan et al., Trends in Genetics J3:1 52 (1997). In this method, screening for clones from a library . 
containing random insertions is preferred to identifying those that have polynucleotides inserted into the gene of interest. 
*s Such screening can be performed using probes and/or primers described above based on sequences from REF AND 
SEQ TABLES 1 AND 2, fragments thereof, and substantially similar sequence thereto. The screening can also be 
performed by select ing clones or R 1 plants having a desired phenotype. 

l.D. Methods of Functional Analysis 

20 

[0135] The constructs described in the methods under I.C. above can bo used to dotermino the function of the 
polypeptide encoded by the gene that is targeted by the constructs. 

[0136] Down-regulating the transcription and translation of the targeted gene in the host cell or organisms, such as 
a plant, may produce phenotypic changes as compared to a wild-type cell or organism. In addition, in vitro assays can 
2S be used to determine if any biological activity, such as calcium flux, DNA transcription, nucleotide incorporation, etc., 
are being modulated by the down-regulation of the targeted gene. 

[0137] Coordinated regulation of sets of genes, e.g., those contributing to a desired polygenic trail, is sometimes 
necessary to obtain a desired phenotype. SDFs of the invention representing transcription activation and DNA binding 
domains can be assembled into hybrid transcriptional activators. These hybrid transcriptional activators can be used 
30 with their corresponding DNA elements (i.e., those bound by the DNA-binding SDFs) to effect coordinated expression 
of desired genes (J.J. Schwarz et al., Moi. Ceil. Biol. J2:266 (1992), A. Martinez et al., Mot. Gen. Genet. 261:546 
(1999)). 

[0138] The SDFs of the invention can also be used in the two-hybrid genetic systems to identify notworks of protein- 
protein interactions (L. McAlister-Henn et al., /We//7ocfe_l9:330 (1 999), J.C. Hu et al., Methods2D:80 (2000), M. Golovkin 
3S et al. t J. Biol. Chem. 274:36428 (1 999), K. Ichimura et al., Biochem. Biophys. Res. Comm. 253:532 (1 998)). The SDFs 
of the invention can also be used in various expression display methods to identify important protein-DNA interactions 
(e.g. B. Luo et al., J. Moi Biol. 266:479 (1 997)). 

I.E. Promoters 

40 

[0139] The SDFs of the invention are also useful as structural or regulatory sequences in a construct for modulating 
the expression of the corresponding gene in a plant or other organism, e.g. a symbiotic bacterium. For example, pro- 
moter sequences associated to SDFs of REF and SEQ TABLES 1 AND 2 of the present invention can be useful in 
directing expression of coding sequences either as constitutive promoters or to direct expression in particular cell types, 

4£ tissues, or organs or in response to environmental stimuli. 

[0140] With respect to the SDFs of the present invention a promoter is likely to be a relatively small portion of a 
genomic DNA (gDNA) sequence locatod in tho first 2000 nucleotides upstroam from an initial oxon idontifiod in a gDNA 
sequence or initial ATG" or methionine codon or translational start site in a corresponding cDNA sequence. Such 
promoters are more likely to be found in the first 1 000 nucleotides upstream of an initial ATG or methionine codon or 

so translational start site of a cDNA sequence corresponding to a gDNA sequence. In particular, the promoter is usually 
located upstream of the transcription start site. The fragments of a particular gDNA sequence that function as elements 
of a promoter in a plant cell will preferably be found to hybridize to gDNA sequences presented and described in REF 
and REF AND SEQ TABLES 1 AND 2 at medium or high stringency, relevant to the length of the probe and its base * 
composition. 

5£ [0141] Promoters are generally modular in nature. Promoters can consist of a basal promoter that functions as a site 
for assembly of a transcription complex comprising an RNA polymoraso, tor example RNA polymerase II. A typlcnl 
transcription complex will include additional factors such as TF| ( B ( TF| ( D, and TF M E, Of these, TF M D appears to be the 
only one to bind DNA directly. The promoter might also contain ono or more onhancors and/or suppressors thtil function 
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The antibodies are also useful lor examining the production level of proteins in various tissues, for example in a wild- 
type plant or following genetic manipulation of a plant, by methods such as Western blotting. 

[0163] Antibodies of the present invention, both polyclonal and monoclonal, may be prepared by conventional meth- 
ods In gonoral, tho polypoptidos of the invontion are first used to immunize a suitable animal, such as a mouse, rat, 

5 rabbit, or goat.' Rabbits and goats are preferred for the preparation of polyclonal sera due to the volume. of serum 
obtainable, and the availability of labeled anti-rabbit and anti-goat antibodies as detection reagents. Immunization is 
generally performed by mixing or emulsifying the protein in saline, preferably in an adjuvant such as Freund's complete 
adjuvant, and injecting the mixture or emulsion parenterally (generally subcutaneously or intramuscularly). A dose of 
50-200 ug/injection is typically sufficient. Immunization is generally boosted 2-6 weeks later with one or more injections 

m ol tho protnin in online, prolombly using Fround's incomplete adjuvant. One may alternatively generate antibodies by 
in vilto immunization using methods known in the art, which lor the puiposos ol this Invontion Is considered equivalent 
to in vivo immunization. 

[01 64] Polyclonal anlisera is obtained by bleeding the immunized animal into a glass or plastic container, incubating 
the blood at 25°C for one hour, followed by incubating the blood at 4°C for 2-18 hours. The serum is recovered by 

is centrifugation (e.g., 1 ,000xg for 10 minutes). About 20-50 ml per bleed may be obtained from rabbits. 

[0165] Monoclonal nntibodios aro propnrod using tho method of Kohler and Milstein, Nature 25S: 495 (1975), or 
modification thereof. Typically, a mouse or rat is immunized as described above. However, rather than bleeding the 
animal to extract serum, the spleen (and optionally several large lymph nodes) is removed and dissociated into single 
colls. If dosirod, tho spleen colls can be screened (after removal of nonspecifically adherent cells) by applying a cell 

20 suspension to a plate, or well, coated with the protein antigen. B-cells producing membrane-bound immunoglobulin 
epocilic for tho antigen bind to tho plate, and are not rinsed away with the rest of the suspension. Resulting B-cells, or 
all (JififioclulfltJ liploon cnlln, mo ihun induced to Iubo with myolomn colls to form hybridomas; and aro cultured in a 
selective medium (e.g., hypoxanlhine, aminopterin, thymidine medium, HAT - ). The resulting hybridomas are plated by 
limiting dilution, and are assayed tor the production of antibodies which bind specifically to the immunizing antigen 

25 (and which do not bind to unrelated antigens). The selected Mab-secreting hybridomas are then cultured either in vitro 
(e.g., in tissue culture bottles or hollow fiber reactors), or in vivo (as ascites in mice). 

[0166] Other methods for sustaining antibody-producing B-cell clones, such as by EBV transformation, are known. 
[0167] II dmiiiod, tho nnlibodion (whether polyclonal or monoclonal) may bo labeled using conventional techniques. 
Suitable labels include fluorophores, chromophores, radioactive atoms (particularly 32 P and 125 l), electron-dense re- 
30 agents, enzymes, and ligands having specific binding partners. Enzymes are typically detected by their activity. For 
example, horseradish peroxidase is usually detected by its ability to convert 3,3\5,5Metramethylbenzidine (TNB) to a 
blue pigment, quantifiable with a spectrophotometer. 
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A.2 In Vitro Applications of Polypeptides 


[0168] Some polypeptides of the invention will have enzymatic activities that are useful in vitro. For example, the 
soybean trypsin inhibitor (Kunitz) lamily is one of the numerous families ol proteinase inhibitors. It comprises plant 
proteins which have inhibitory activity against serine proteinases Irom the trypsin and subtilisin families, thiol protein- 
ases and aspartic proteinases. Thus, these peptides find in vitro use in protein purification protocols and perhaps in 
40 therapeutic settings requiring topical application of protease inhibitors. 

[0169] Delta-aminolevulinic acid dehydratase (EC 4.2.1.24) (ALAD) catalyzes the second step in the biosynthesis 
of homo tho condensation of two molecules of 5-aminolevulinate to form porphobilinogen and is also involved in chlo- 
rophyll biosynthesis(Kaczor et al. (1994) Plant Physiol. 1-4: 1411-7; Smith (1988) Biochem. J. 249: 423-8; Schneider 
(1976) Z. naturforsch. [C] 31: 55-63). Thus, ALAD proteins can be used as catalysts in synthesis of heme derivatives. 
45 Enzymes of biosynthetic pathways generally can be used as catalysts for in vitro synthesis of the compounds repre- 
senting products of the pathway. 

[0170] Polypeptides encoded by SDFs of the invention can be engineered to provide purification reagents to identify 
and purify additional polypeptides that bind to them. This allows one to identify proteins that function as multimers or 
elucidate signal transduction or metabolic pathways. In the case of DNA binding proteins, the polypeptide can be used 
so in a similar manner to identify the DNA determinants of specific binding (S. Pierrou et al., Anal. Biochem. 229:99 (1 995), 
S. Chusacultanachai et al.. J. Biol. Chem. 274:23591 (1999), Q. Lin et al., J. Biol. Chem. 272:27274 (1997)). 

ir e . POLYPEPTIDE VARIANTS , FRAGMENTS, AND FUSIONS 

55 [0171] Generally, variants , fragments, or fusions of the polypeptides encoded by th maximum length sequence 
(MLS) can exhibit at least one of the activities of the identified domains and/or related polypeptides described in Sections 
(C) and (D) of REF TABLES 1 and 2 corresponding to the MLS of interest. 
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1 (AAA} AAA-protein family signature ,UHiii-iovt*h«itt««w flW,vw,,tl<,lM !i 
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containing two AAA domains: 

- Mammalian and drosophila NSF (N-ethylmaleimide-sensitive fusion protein) and the fungal homolog, SEC18. 
Thoco proloins are involved in intracellular transport between the endoplasmic reticulum and Golgi, as well as 

6 botwoon dilloront Golgi cislornao. 

- - Mammalian transitional endoplasmic reticulum ATPase (previously known as p97 or VCP) which is involv d in the 

transfer of membranes from the endoplasmic reticulum to the golgi apparatus. This protein forms a ring-shaped 
homooligomer composed of six subunits. The yeast homolog is CDC48 and it may play a role in spindle pole 
proliloration. 

to - Yofisl proloin PAS1 , ocBonlial lor poroxisorno ussombly and tho rolatod protoin PAS1 from Pichia pastons. 
Yeast protein AFG2. 

. Sulfolobus acidocaldarius protein SAV and Halobacterium salinarium cdcH which may be part of a transduction 
pathway connecting light to cell division. 

is [0182] Proteins containing a single AAA domain: 

- Escherichia coli and other bacteria ftsH (or hflB) protoin. FtsH is an ATP-dependent zinc metallopeptidase that 
seems to degrade the heat-shock sigma-32 factor. 

so [01 83] It is an integral membrane protein with a large cytoplasmic C-terminal domain that contain both the AAA and 
tho protoaso domains. 

- Yeast protein YME1 , a protein important for maintaining the integrity of the mitochondrial compartment. YME1 is 
also a zinc-dependent protease. 

25 - Yeast protein AFG3 (or YTA1 0). This protein also seems to contain a AAA domain followed by a zinc<Jependent 
protease domain. 

[0184] Subunits from the regulatory complex of the 26S proteasome [6] which is involved in the ATP-dependent 
degradation of ubiquitinated proteins: 


30 


a) Mammalian subunit 4 and homologs in other higher eukaryotes, in yeast (gene YTA5) and fission yeast (gene 

^Mammalian subunit 6 (TBP7) and homologs in other higher eukaryotes and in yeast (gene YTA2). 
c) Mammalian subunit 7 (MSS1) and homologs in other higher eukaryotes and in yeast (gene CIM5 or YTA3). 
35 d) Mammalian subunit 8 (P45) and homologs in other higher eukaryotes and in yeast (SUG1 or CIM3 or TBY1) 

and fission yeast (gene letl). 

[0185] Other probable subunits such as human TBP1 which seems to influences HIV gene expression by interacting 
with the virus tat transactivator protein and yeast YTA1 and YTA6. 


40 


Yeast protein BCS1 , a mitochondrial protein essential for the expression of the Rieske iron-sulfur protein. 
Yeast protein MSP1, a protein involved in intramitochondrial sorting of proteins. 
- Yeast protein PAS8, and the corresponding proteins PASS Irom Pichia pastoris and PAY4 from Yarrowia lipolytica. 
Mouse protein SKD1 and its fission yeast homolog (SpAC2G11 .06). 
45 - Caenorhabditis elegans moiotic spindle formation protoin mei-1. 
Yeast protein SAP 1. 
Yeast protein YTA7. 

Mycobacterium leprae hypothetical protein A2126A. 

50 [0186] It is proposed that, in general, the AAA domains in these proteins act as ATP-dependent protein clamps [5]. 
In addition to the ATP-binding 'A' and B' motifs, which are located in the N-terminal half of this domain, there is a highly 
conserved region located in the central part of the domain which was used to develop a signature pattern. 
Consensus pattern: [LIVMT]-x-[LIVMT]-[LIVMFhx-[GATMC]-[STHNS]-x(4)-[LIVM]-D-x-A-[LIFA]-x-R 

55 [1] Froohlich K.-U., Fries H.W., Ruodiger M., Erdmann R., Botstein D., Mecke D. J. Cell Biol. 114:443-453(1991). 

[2] Erdmann R.. Wiebel F.F., Flessau A., Rytka J., Beyer A., Froehlich K.-U.. Kunau W.-H. Cell 64:499-510(1991). 
[3] Peters J.-M., Walsh M.J., Franke W.W. EMBO J. 9:1757-1767(1990). 

[4] Kunau W.-H., Beyer A., Goette K., Marzioch M. , Saidowsky J., Skaletz-Rorowski A., Wiebel FF. Biochimie 75: 
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209-224(1993). 

[5] Confalonieri F, Duguet M. BioEssays 1 7:639-650(1 995).[ 6] Hilt W. ( Wolf D.H. Trends Biochem. Sci. 21:96-102 
(1996). 

5 2. ABC Membrane (ABC transporter transmembrane region). This family repres nts a unit of six transmembrane 
helices. Many members of the ABC transporter family (ABC tran)have two such regions. See also descriptions of 
ABC_Tra n, below, and ABC2 membrane, above. 

3. (ABC Tran) 

w 

ABC transporters family signature 

[0187] On the basis of sequence similarities a family of related ATP-bindingproteins has been characterized [1 to 5]. 
These proteins are associated with avariety of distinct biological processes in both prokaryotes and eukaryotes, but a 

*5 majority of them are involved in active transport of small hydrophilic molecules across the cytoplasmic membrane. All 
these proteins share a conserved domain of some two hundred amino acid residues, which includes an ATP-binding 
site. These proteins are collectively known as ABC transporters. Proteins known to belong to this family are listed 
below (references are only provided for recently determined sequences).ln prokaryotes: - Active transport systems 
components: alkylphosphonate uptake(phnC/phnK/ phnL); arabinose (araG); arginine (artP); dipeptide (dciAD;dppD/ 

20 dppF); ferric enterobactin (fepC); ferrichrome (fhuC); galactoside (mgIA); glutamine (glnQ); glycerol-3-phosphate (ug- 
pC); glycine betaine/L-proline (proV); glutamate/aspatate (gltL); histidine (hisP); iron(lll) (sfuC), iron(lll) dicitrato (focE); 
lactose (lacK); leucine/isoleucine/valine (braF/braG;livF/livG); maltose (malK); molybdenum (modC); nickel (nikD/ 
nikE); oligopeptide (amiE/amiF;oppD/oppF); peptide (sapD/sapF); phosphate (pstB); putrescine (potG); ribose (rbsA); 
spermidine/putrescine (pot A); sulfate (cysA); vitamin B12 (btuD). - Hemolysin/leukotoxin export proteins hlyB, cyaB 

25 and IktB. - Colicin V export protein cvaB. - Lactococcin export protein IcnC [6]. - Lantibiotic transport proteins nisT 
(nisin) and spaT (subtilin). - Extracellular proteases B and C export protein prtD. - Alkaline protease secretion protein 
aprD. - Beta-(1,2)-glucan export proteins chvA and ndvA. - Haemophilus influenzae capsule-polysaccharide export 
protein bexA. - Cytochrome c biogenesis proteins ccmA (also known as cycV and helA). - Polysialic acid transport 
protein kpsT - Cell division associated ftsE protein (function unknown). - Copper processing protein nosF from Pseu- 

30 domonas stutzeri. - Nodulation protein nodi from Rhizobium (function unknown). - Escherichia coli proteins cydC and 
cydD. - Subunit A of the ABC excision nuclease (gene uvrA). - Erythromycin resistance protein from Staphylococcus 
epidermidis (gene msrA). - Tylosin resistance protein from Stroptomyces fradiae (gono tlrC) |7]. - Hotorocyet differen- 
tiation protein (gene hetA) from Anabaena PCC 71 20. - Protein P29 from Mycoplasma hyorhinis, a probable component 
of a high affinity transport system. - yhbG, a putative protein whose gene is linked with ntrA in many bacteria such as 

35 Escherichia coli, Klebsiella pneumoniae, Pseudomonas putida, Rhizobium meliloti and Thiobacillus ferrooxidans. - 
Escherichia coli and related bacteria hypothetical proteins yabJ, yadG, yagC, ybbA, ycjW, yddA, yehX, yejF, yheS, 
yhiG, yhiH, yjcW, yjjK, yojl, yrbF and ytfR.ln eukaryotes: - The multidrug transporters (Mdr) (P-glycoprotein), a family 
of closely related proteins which extrude a wide variety of drugs out of the cell (for a review see [8]). - Cystic fibrosis 
transmembrane conductance regulator (CFTR), which is most probably involved in the transport of chloride ions. - 

40 Antigen peptide transporters 1 (TAP1 , PSF1 , RING4, HAM-1 , mtpl ) and 2 (TAP2, PSF2, RING11 , HAM-2, mtp2). which 
are involved in the transport of antigens from the cytoplasm to a membrane-bound compartment for association with 
MHC class I molecules. - 70 Kd peroxisomal membrane protein (PMP70). - ALQP, a peroxisomal protein involved in 
X-linkedadrenoleukodystrophy [9]. - Sulfonylurea receptor [10], a putative subunit of the B<oll ATP-sonsitivo potassium 
channel. - Drosophila proteins white (w) and brown (bw), which are involved in the import of ommatidium screening 

45 pigments. - Fungal elongation factor 3 (EF-3). - Yeast STE6 which is responsible for the export of the a-factor pherom- 
one. - Yeast mitochondrial transporter ATM1. - Yeast MDL1 and MDL2. - Yeast SNQ2. - Yeast sporidesmin resistance 
protein (gene PDR5 or STS1 or YDR1). - Fission yeast heavy metal tolerance protein hmtl. This protein is probably 
involved in the transport of metal-bound phytochelatins. - Fission yeast brefeldin A resistance protein (gene bfrl or 
hba2). - Fission yeast leptomycin B resistance protein (gene pmdl). - mbpX, a hypothetical chloroplast protein from 

50 Liverwort. - Prestalk-specific protein tagB from slime mold. This protein consists of two domains: a N-terminal subtilase 
catalytic domain and a C-terminal ABC transporter domain. As a signature pattern lor this class of proteins, a conserved 
region which is located between the 'A' and the 'B' motifs of the ATP-binding site was used. 

[0188] Consensus pattern: [LIVMFYC]-[SA]-[SAPGLVFYKQH]-G-[DENQMW]-[KRQASPCLIMFW]-[KRNQSTAVM]- 
[KRACLVM]-[LIVMFYPAN]-{PHYHLIVMFW]- [SAGCLIVP]-{FYWHP}-{KRHP)-fLIVMFYWSTA] The ATP-binding re- 
55 gion is duplicated in araG, mdl, msrA, rbsA, tlrC, uvrA, yojF, Mdr's, CFTR, pmdl und In EF-3. In oomo of thooo protolno, 
the above pattern only detect one of the two copies of the domain. The proteins belonging to this family also contain 
one or two copies of the ATP-binding motifs 'A' and B\ 
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tS 4. (ACBP) 
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a2 e P ine (BZD) ' oc °9 n,1,0 ^!" ^e action ol the GABA receptor 12].ACBP _.s a 9 {Q , he N . termlrna l section 

Consensus pattern: P-ISTA]-* |ut i . 89 . 1128 7-11291(1992). 

^ ^ r i Proc Natl. Acad. Sci. U.S.A. as. 

5. (AIRS) 

AIR synthase related proteins , in H voE A.R synthases. FGAM synthase and se- 


lenide, water dikinase 
6. (AMP-binding) 
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16 


9, (ATP eynt) 

ATP eynlhrifio nnrninn wibunll ftifjnnturo 

[0195] ATP synthase (proton-translocating ATPase) (EC 3.6.1 .34) [1,2] is a componentof the cytoplasmic membrane 
of eubacteria, the inner membrane of mitochondria, and the thylakoid membran of chloroplasts. The ATPase complex 
is composed of an oligomeric transmembrane sector, called CF(0). and a catalytic core, called coupling factor CF(1). 
The tormor acts as a proton channel; the latter is composed of five subunits, alpha, beta, gamma, delta and epsilon. 
r.uhunil rjjimrmi in bolinvncJ to bo important in rogul»ting ATPhso activity and the flow of protons through' the. CF(0) 
complex. The best conserved region of the gamma subunit [3] is its C-terminus which seems to be essential ior as- 
sembly and catalysis. As a signature pattern to detect ATPase gamma subunits, al 4 residue conserved segment where 
tho last amino acid is found one to three residues from the C-lerminal extremity was used. 

[0196] Consensus pattern: [iV]-T-x-E-x(2)-[DE]-x(3)-G-A-x-[SAKR]- Note: Pea chloroplast gamma and two Bacillus 
species gamma subunits are not delected by this motif, 

[ 1] Futai M., Noumi T.. Maeda M. Annu. Rev. Biochem. 58:111-136(1989). 
[ 2] Senior A.E. Physiol. Rev. 68:177-231(1988). 
[ 3] Miki J., Maeda M., Mukohata Y, Futai M. FEBS Lett. 232:221-226(1988). 

po 10. (ATP Synt A) 

Synthase a subunit signature 

[0197] ATP synthase (proton-translocating ATPase) (EC 3.6.1.34) [1 ,2) is a component of the cytoplasmic membrane 
25 of eubacteria, the inner membrane of mitochondria.and the thylakoid membrane of chloroplasts. The ATPase complex 
is composed of an oligomeric transmembrane sector, called CF(0), which acts as a proton channel, and a catalytic 
core, termed coupling factor CF(1 ).The CF(0) a subunit, also called protein 6, is a key component of the proton channel; 
it may play a direct role in translocating protons across the membrane. It is a highly hydrophobic protein that has been 
predicted to contain 8 transmembrane regions [3].Sequence comparison of a subunits from all available sources reveals 
30 very few conserved regions. The best conserved region is located in what is predicted to be the fifth transmembrane 
domain. This region contains three perfectly conserved residues: an arginine, a leucine and an asparagine. Mutagen- 
esis experiments of ATPase activity. This region was selected as a signature pattern. 

Consensus pattern: [STAGN)-x-[STAG]-[LIVMF]-R-L-x-ISAGV]-N-[LIVMT] [R is important for proton translocation] 

3S [ 1] Futai M„ Noumi T., Maeda M. Annu. Rev. Biochem. 58:111-136(1989). 

[ 2] Senior A.E. Physiol. Rev. 68:177-231(1988). 

[ 3] Lewis M.L.. Chang J.A., Simoni R.D. J. Biol. Chem. 265:10541-10550(1990). 
.[ 4] Cain B.D., Simoni R.D. J. Biol. Chem. 264:3292-3300(1989). 

40 11. ATP synthase B 

[0198] Part of the CF(0) (base unit) of the ATP synthase. The base unit is thought to translocate protons through 
membrane (inner membrane in mitochondria, thylakoid membrane in plants, cytoplasmic membrane in bacteria). The 
B subunits are thought to interact with the stalk of the CF(1) subunits. 


45 


12. (ATP synt C) 

ATP synthase c subunit signature 

so [01 99] ATP synthase (proton-translocating ATPase) [1 ,2] is a component of the cytoplasmic membrane of eubacteria, 
the inner membrane o1 mitochondria.and the thylakoid membrane of chloroplasts. The ATPase complex is composed 
of an oligomeric transmembrane sector, called CF(0), which acts as a proton channel, and a catalytic core, termed 
coupling factor CF(1 ).The CF(0) c subunit (also called protein 9, proteolipid, or subunit III) [3,4]is a highly hydrophobic 
protein of about 8 Kd which has been implicated in the proton-conducting activity of ATPase. Structurally subunit c 

55 consist ol two long terminal hydrophobic regions, which probably span tho mombrane, and a central hydrophilic r gion. 
N.N'-dicyclohexylcarbodiimide (DCCD) can bind covalently to subunit c and thereby abolish the ATPase activity. DCCD 
binds to a specific glutamate or aspartate residue which is located in the middle ofthe second hydrophobic region near 
the C-terminus of the protein. A signature pattern which includes the DCCD-binding residue was derived. 
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[0200] Consensus pattern: [GSTA]-R-[NQ]-P-x(10)-[UVMFYW](2)-x(3)-[LIVMFYW]-x-[DE] [D or E binds DCCD] 

( 1] Futai M., Noumi T., Maeda M. Annu. Rev. Biochem. 58:111-136(1989). 
[ 2] Senior A.E. Physiol. Rev. 68:177-231(1988). 
5 [ 3] Ivaschenko AT., Karpenyuk T.A., Ponomarenko S.V. Biokhimiia 56:406-41 9(1 991 ). 

[ 4] Recipon H., Perasso R., Adoutte A., Quetier F. J. Mol. Evol. 34:292-303(1992). 

13. (ATP synt DE) 

io ATP synthase, Delta/Epsilon chain 

[0201] Part of the ATP synthase CF(1). These subunits are part of the head unit of the ATP synthase. The subunits 
are called delta and epsilon in human and metozoan species but in bacterial species the delta (D) subunit is theequiv- 
alent to the Oligomycin sensitive subunit (OSCP) in metozoans. 

75 

14. (ATP synt ab) 

ATP synthase alpha and beta subunits signature 

20 [0202] ATP synthase (proton-translocating ATPase) [1 ,2] is a component of the cytoplasmic membrane of oubacteria, 
the inner membrane of mitochondria, and the thylakoid membrane of chloroplasts. The ATPase complex is composed 
of an oligomeric transmembrane sector, called CF(0), and a catalytic core, called coupling factor CF(1). The former 
acts as a proton channel; the latter is composed of five subunits, alpha, beta, gamma, delta and epsilon. The sequences 
of subunits alpha and beta are related and both contain a nucleotide-binding site for ATP and ADR The beta chain has 

25 catalytic activity, while the alpha chain is a regulatory subunit. Vacuolar ATPases [3] (V-ATPases) are responsible for 
acidifying a variety of intracellular compartments in eukaryotlc colls. Like F-ATPasos, thoy are oligomoric comploxos 
of a transmembrane and a catalytic sector. The sequenceof the largest subunit of the catalytic sector (70 Kd) is rotated 
to that of F-ATPase beta subunit, while a 60 Kd subunit, from the same sector, is related to the F-ATPases alpha subunit 
[4].Archaebacterial membrane-associated ATPases are composed of three subunits. The alpha chain is related to F- 

^0 ATPases beta chain and the beta chain is related to F-ATPases alpha chain [4]. A protein highly similar to F-ATPase 
beta subunits is found [5] in some bacterial apparatus involved in a specialized protein export pathway that procoods 
without signal peptide cleavage. This protein is known as flil in Bacillus and Salmonella, Spa47 (mxiB) in Shigella 
flexneri, HrpB6 in Xanthomonas campestris and yscN in Yersinia virulence plasmids.To detect these ATPase subunits, 
a segment of ten amino-acid residues, containing two conserved serines, as a signature pattern was selected. The 

35 first serine seems to be important for catalysis - in the ATPase alpha chain at least - as its mutagenesis causes catalytic 
impairment. 

[0203] Consensus pattern: P-[SAP]-[LIV]-[DNH]-x(3)-S-x-S [The first S is a putative active site residue] 

[ 1] Futai M., Noumi T, Maeda M. Annu. Rev. Biochem. 58:111-136(1989). 
40 [ 2] Senior A.E. Physiol. Rev. 68:177-231(1988). 

[ 3] Nelson N. J. Bioenerg. Biomembr. 21:553-571(1989). 

[ 4] Gogarten J.P, Kibak H., Dittrich P., Taiz L, Bowman E J., Bowman B.J., Manolson M.F., Poole R.J., Date T., 
Oshima T, Konishi J., Denda K. ( Yoshida M. Proc. Natl. Acad. Sci. U.S.A. 86:6661-665(1989). 
[ 5] Dreyfus G., Williams A.W., Kawagishi I., MacNab R.M. J. Bacterid. 175:3131-3138(1993). 

45 

1 5. (ATP synt ab C) 

ATP synthase ab C terminal. 

50 [0204] Number of members: 190 

[1] Abrahams JP, Leslie AG, Lutter R, Walker JE; Structure at 2.8 A resolution of F1 -ATPase from bovine heart mito- 
chondria." Nature 1994;370:621-628. 

16 r (A deaminase) 

55 

Adenosin and AMP deaminase signature I 
[0205] Adenosine deaminase catalyzes the hydrolytic deamination ofadenosine into inosine. AMP deaminase cat- 


32 


EP 1 033 405 A2 


alyzes the hydrolytic deamination of AMP into I MR It has been shown [ 1 ] that these two types of enzymes shar three 
regions of sequence similarities; Ihoso regions are conlorod on roslduos which aro proposed to play an important role 
in the catalytic mechanism of these two enzymes. One of these regions, containing two conserved aspartic acid residues 
that aro potential active silo rosiduos was selected. 
5 Consensus pattern: [SA]-[LIVM]-[NGS]-[STA]-D-D-P (The two D's are putative active site residues] 
[1] Chang Z., Nygaard P., Chinault A.C., Kelloms RE. Biochemistry 30:2273-2280(1991). 

17. (Acetyltransf) 

io Acetyltransterase (GNAT) family. 

[0206] This family contains proteins with N-acetyltransferase functions. 
[1] Neuwald AF, Landsman D; Trends Biochem Sci 1997;22:154-155. 

is 18. (Aconitaso C) 

Aconitase family signature 

[0207] Aconitase (aconitate hydratase) (EC 4.2.1.3) [11 is the enzyme from the tricarboxylic acid cycle that catalyzes 
20 the reversible isomerization of citrate and isocitrate. Cis-aconitate is formed as an intermediary product during the 
courso of the roaction. In oukaryolos two isozymes of aconitase are known to exist: one found in the mitochondrial 
matrix and the other found in the cytoplasm. Aconitase, in its active form, contains a 4Fe-4S iron-sulfur cluster; three 
cysteine residues have been shown to be ligands of the 4Fe-4S cluster.lt has been shown that the aconitase family 
also contains the followingproteins: - Iron-responsive element binding protein (IRE-BP). IRE-BP is a cytosolic protein 
25 that binds to iron-responsive elements (IREs). IREs are stem-loop structures found in the 5'UTR of ferritin, and delta 
Mminolovulinic nclcJ oynlhaso mRNAs, and In Iho 3'UTR of transferrin receptor mRNA. IRE-BP also express aconitase 
activity. - 3-isiopiopylrnalulo dehydratase (EC 4.2.1.33) (Isopiopylmalulo Isomomso), tho onzymo thHt catalyzes the 
second step in the biosynthesis of leucine. - Homoaconitase (EC 4.2.1.36) (homoaconitate hydratase), an enzyme that 
participates in the alpha-aminoadipate pathway of lysine biosynthesis and that converts cis-homoaconitate into ho- 
30 moisocitric acid. - Esherichia coli protein ybhJ.As a signature for proteins from the aconitase family, two conserved 
regions that contain the three cysteine ligands of the 4Fe-4Scluster were selected. 

Consensus pattorn: |LIVM]-x(2)-[GSACIVM]-x-[LIV]-[GTIV]-[STP]-C-x(0,1)-T.N-[GSTANI)-x(4)-[LIVMA] [C binds the 
iron-sulfur center] 

Consensus pattern: G-x(2)-[LIVWPQ]-x(3)-[GAC]-C-[GSTAM]-[LIMPTA]-C-[LIMV]-[GA] [The two C*s bind the iron-sul- 
35 fur center] - 

[ 1] Gruer M.J., Artymiuk P.J., Guest J.R. Trends Biochem. Sci. 22:3-6(1997). 

19. (Acyl-CoA dh) 

40 Acyl-CoA dehydrogenases signatures 

[0208] Acyl-CoA dehydrogenases [1,2,3] are enzymes that catalyze the alpha, beta-dehydrogenation of acyl-CoA 
esters and transfer electrons to ETF, the electron transfer protein. Acyl-CoA dehydrogenases are FAD flavop rote ins. 
This family currently includes: - Five eukaryotic isozymes that catalyze the first step of the beta-oxidation cycles for 

45 fatty acids with various chain lengths. These are short (SCAD) (EC 1.3.99.2), medium (MCAD) (EC 1.3.99.3), long 
(LCAD) (EC 1.3.99.13), very-long (VLCAD) and short/branched (SBCAD) chain acyl-CoA dehydrogenases. These 
enzymes are located in the mitochondrion. They are all homotetrameric proteins of about 400 amino acid residues 
except VLCAD which is a dimer and which contains, in its mature form, about 600 residues. - Glutaryl-CoA dehydro- 
genase (EC 1.3.99.7) (GCDH), which is involved in the cataboiism of lysine, hydroxylysine and tryptophan. - Isovaleryl- 

so CoA dehydrogenase (EC 1 .3.99. 10 ) (IVD), involved in the cataboiism of leucine. - Acyl-coA dehydrogenases acsA and 
mmgC from Bacillus subtilis. - Butyryl-CoA dehydrogenase (EC 1.3.99.2) from Clostridium acetobutylicum. - Es- 
cherichia coli protein caiA [4], - Escherichia coli protein aidB. Two conserved regions were selected as signature pat- 
terns. The first is located in the center of these enzymes, the second in the C-terminal section. 
Consensus pattern: [GAC]-[LIVM]-[ST]-E-x(2)-[GSAN]-G-[ST]-D-x(2)-[GSA] 

55 Consensus pattern: [QDE]-x{2)-G-|GS]-x-G-[LIVMFY]-x(2)-[DEN]-x(4)-[KR]-x(3)-[DEN] 

[ 1] Tanaka K., Ikeda, Matsubara Y, Hyman D.B. Enzyme 38:91-107(1987). 

[2] Matsubara Y, Indo Y, Naito E. ( Ozasa H., Glassberg R., Vockley J., Ikeda Y, Kraus J., Tanaka K. J. Biol. 
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Chem. 264:16321-16331(1989). 

[ 3] Aoyama T., Ueno I., KamijoT., Hashimoto T. J. Biol. Ch m. 269:19086-19094(1994). 

[ 4] Eichler K„ Bourgis R, Buchet A., Kleber H.-R, Mandrand-Berthelol M.-A. Mol. Microbiol. 13:775-786(1994). 

20. (Acyl transf ) 

Acyl transferase domain 

[0209] Number of members: 161 

|1] Serre L. Verbree EC, Dauter Z, Stuitje AR, Derewenda ZS; Medline: 95286570 The Escherichia coli malonyl-CoA: 
acyl carrier protein transacylase at 1 .5-A resolution. Crystal structure of a fatty acid synthase component." J Biol Chem 
1995;270:12961-12964. 

21. Acylphosphatase signatures 

[021 0J Acylphosphatase (EC 3.6.1.7 ) [1,2] catalyzes the hydrolysis of various acylphosphate carboxyi-phosphate 
bonds such as carbamyl phosphate, succinylphosphate, 1 ,3-diphosphoglycerate, etc. The physiological role of this 
enzymeis not yet clear. Acylphosphatase is a small protein of around 100 amino-acid residues. There are two known 
isozymes. One seems to be specific to muscular tissues, the other, called 'organ-common type*, is found in many 
different tissues. While acylphosphatase have been so far only characterized in vertobrates.thoro are a number of 
bacterial and archebacterial hypothetical proteins that are highly similar to that enzyme and that probably possess the 
same activity.These proteins are: - Escherichia coli hypothetical protein yccX. - Bacillus subtilis hypothetical protein 
yflL. - Archaeoglobus fulgidus hypothetical protein AF0818. Two conserved regions were selected as signature pat- 
terns. The first is located in the N-terminal section, while the second is found in the central part ofthe protein sequence. 
Consensus pattern: [LIV]-x-G-x-V-Q-G-V-x-[FM]-R 
Consensus pattern: G-[FYW]-[AVC]-[KRQAM]-N-x(3)-G-x-V-x(5)-G 

I 1] Stefani M. p Ramponi G. Life Chem. Rep. 12:271-301(1995). 

[ 2] Stefani M. ( Taddei N., Ramponi G. Cell. Mol. Life Sci. 53:141-151(1997). 

22. (Adap comp sub) 

Clathrin adaptor complexes medium chain signatures. 

[0211] Clathrin coated vesicles (CCV) mediate intracellular membrane traffic such asreceptor mediated endocytosis. 
In addition to clathrin, the CCV are composed of a number of other components including oligomeric complexes which 
are knownas adaptor or clathrin assombly protoins (AP) comploxos 1 1 ]. Tho ndnptor cornploxoo wo hotlovor J lo inlnrnct 
with the cytoplasmic tails of membrane proteins, leading to their selection and concentration. In mammals two type of 
adaptor complexes are known: AP-1 which is associated with the Golgi complex and AP-2 which is associated with 
the plasma membrane. Both AP-1 and AP-2 are heterotetramers that consist of two large chains - the adaptins - 
(gamma and beta' in AP-1 ; alpha and beta in AP-2); a medium chain (AP47 in AP-1 ; AP50 inAP-2) and a small chain 
(AP19 in AP-1 ; AP17 in AP-2). The medium chains of AP-1 and AP-2 are evolutionary related proteins of about 50 Kd. 
Homologs of AP47 and AP50 have also boon found In Caenorhabdttis ologans (gonos unc-1 01 and ap50) |2| find yonot 
(gene APM1 or YAP54) [3]. Some more divergent, but clearly evolutionary related proteins have also been found in 
yeast: APM2 and YBR288c, Two conserved regions were selected as signature patterns, one located in the N-terminal 
region, the other from the central section of these proteins. 

Consensus pattern: [IVT]-[GSP]-W-R-x(2,3)-[GAD]-x(2)-[HY]-x(2)-N-x- [LIVMAFY](3)-D-[LIVM]-[LlVMT]-E 
Consensus pattern: (LIV]-x-F-l-P-P-x-G-x-ILIVMFY]-x-L-x(2)-Y 

[ 1] Pearse B.M., Robinson M.S. Annu. Rev. Cell Biol. 6:151-171(1990). 
[ 2] Lee J., Jongeward G.D., Sternberg RW. Genes Dev. 8:60-73(1994). 

[ 3) Nakayama Y, Goebl M., O'Brine G.B., Lemmon S„ Pingchang C.E., Kirchhausen T. Eur. J. Biochem. 202: 
569-574(1991). 
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of alkylation products (EC 3.2.2.21). Tag and alkA do not share any region of sequence similarity. In yeast there is an 
alkylbase DNA glycosidase (gene MAG1) [2,3], which can remove 3-methyladenine or 7-methyladenine and which is 
structurally related to alkA. MAG and alkA are both proteins of about 300 amino acid residues. While the C- and N- 
terminal ends appear to be unrelated, there is a central region of about 1 30 residues which is well conserved. A portion 
of this region has been selected as a signature pattern . 

Consensus pattern: G-l-G-x-W-[ST]^[A\^-x-[LIVMFY](2)-x-[LIVM]-x(8)-[MF]-x(2)-[ED]-D 

[1] Lindahl T., Sedgwick B. Annu. Rev. Biochem. 57:133-157(1988). 

[ 2] Berdal K.G., Bjoras M., Bjelland S., Seeberg E.C. EMBO J. 9:4563-4568(1990). 

[ 3] Chen J., Derfler B., Samson L. EMBO J. 9:4569-4575(1990). 

28. Ammonium transporters signature 

[0217] A number of proteins involved in the transport of ammonium ions across amembrane as well as some yet 
uncharacterized proteins have been shown [1,2] to be evolutionary related. These proteins are: - Yeast ammonium 
transporters MEP1, MEP2 and MEP3. - Arabidopsis thaliana high affinity ammonium transporter (gene AMT1),-- Co- 
rynebacterium glutamicum ammonium and methylammonium transport system. • Escherichia coli putative ammonium 
transporter amtB. - Bacillus subtilis nrgA. - Mycobacterium tuberculosis hypothetical protein MtCY338.09c. - Syne- 
chocystis strain PCC 6803 hypothetical proteins sII0108, SII0537 and sll10l7 - Methanococcus jannaschii hypothetical 
proteins MJ0058 and MJ1343. - Caenorhabditis elegans hypothetical proteins C05E11.4, F49E11.3 and M195.3. As 
expected by their transport function, these proteins are highly hydrophobic and seem to contain from 10 to 12 trans- 
membrane domains. The best conserved region seems to be located in the fifth (or sixth) transmembrane region and 
is used as a signature pattern. 

Consensus pattern: D-[FYWS]-A-G-[GSC]-x(2HIV]-x(3)-[SAG](2)-x(2)-[SAG]-[LIVMF]-x(3)-[LIVMFYWA](2)-x-[GK]-x- 
R 

[ 1] Ninnemann O., Janniaux J.-C, Frommer W.B. EMBO J. 13:3464-3471(1994). 

[ 2] Siewe R.M., Weil B., Burkovski A., Eikmanns B.J., Eikmanns M., Kraemer R. J. Biol. Chem. 271:5398-5403 
(1996). 

[ 3] Saier M.H. Jr. Adv. Microbiol. Physiol. 40:81-136(1998). 

29. (Arch_histone) 
CBF/NF-Y subunits signatures 

[0218] Diverse DNA binding proteins are known to bind the CCAAT box, a common cis-acting element found in the 
promoter and enhancer regions ol a largo numbor ol gonos in oukHryotos. Amongst Ihoso proteins in ono known hb 
the CCAAT-binding factor (CBF) or NF-Y [1]. CBF is a heteromeric transcription factor that consists ol two different 
components both needed for DNA-binding. The HAP protein complex of yeast binds to the upstream activation site of 
cytochrome C iso-T gene (CYC1) as well as other genes involved in mitochondrial electron transport and activates 
their expression. It also recognizes the sequence CCAAT and is structurally and evolutionary related to CBF. The first 
subunit of CBF, known as CBF-A or NF-YB in vertebrates, HAP3 in budding yeast and as php3 in fission yeast, is a 
protein of 116 to 210 amino-acid residues which contains a highly consorvod contral domain of about OOrosiduos. This 
domain seems to be involved in DNA-binding; a signature pattern had been developed from its central part. The second 
subunit of CBF, known as CBF-B or NF-YA in vertebrates, HAP2 in budding yeast and php2 in fission yeast, is a protein 
of 265 to 350 amino-acid residues which contains a highly consorvod region of about 60 residues. This rogion, callod 
the 'essential core' [2], seems to consist of two subdomains: an N-terminal subunit-association domain and a C-terminal 
DNA recognition domain. A signature pattern has been developed from a section of the subunit-association domain. 
Consensus pattern: C-V-S-E-x-l-S-F-[LIVM)-T-[SG]-E-A-[SC]-[DE]-[KRQ]-C- 
Consensus pattern: Y-V-N-A-K-Q-Y-x-R-l-L-K-R-R-x-A-R-A-K-L-E- 

[ 1] Li X.-Y, Mantovani R., Hooft van Huijsduijnen R., Andre I., Benoist C, Mathis D. Nucleic Acids Res. 20: 
1087-1091(1992). 

[ 2] Olesen J.T., Fikes J.D., Guarento L Mol. Cell. BloL 11:611-619(1991). 

30. Argininosuccinate synthase signatures I 
[0219] Argininosuccinate synthase (EC 6.3.4.5 ) (AS) is a urea cycle enzyme that catalyzes the penultimate step in 
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10 


arainine bioeynlhoeia the ATP^opondonl lignlion ol citrulline to aspartate to form argininosuccmate, AMP andpyro- 
phosphale [1 2] In humans, a defect in the AS gone causes citrullinemla, a genetic disease characterized by severe 
vumliinu *\>t*L miti inonlnl inlm<lnllonAG In n homololrnmnrlc orvymo.ol chains of ob ut 400 amino-acid residues. 
Anarginine seems to bo important for the enzyme's catalytic mechanism. The sequences of AS from various prokary- 
otes archaebacteria and eukaryotes show significant similarity. Two signature patterns have been selected for AS. 
The first is a highly conserved stretch of nine.residues located in the N-terminal extremity of these enzymes, the second 
is derived from a conserved region which contains one of the conserved arginine residues. 
Consensus pattern: [AS]-|FY]-S-G-G-[LV]-D-T-[ST]- 
Consonsus pallorn: G-x-T-x-K-G-N-D-x(2)-R-F- 

[ 1] van Vliet F., Crabeel M M Boyen A., Tricot C, Staton V., Fatmagne P., Nakamura Y, Baumberg S. f Glansdorff 
N. Gene 95:99-104(1990). 

[ 2] Morris C.J., Reeve J.N. J. Bacterid. 170:3125-3130(1988). 

is 31. Armadillo/beta-catenin-like repeats 

[0220] Approx. 40 amino acid repeat. Tandem repeats form super-helix of helices that is proposed to mediate inter- 
action of beta-catenin with its ligands. CAUTION: This family does not contain all known armadillo repeats. 

20 [1] Huber AH, Nelson WJ, Weis Wl, Cell 1 997;90:871 -882. 

[2] Gumbiner BM, Curr Opin Coll Biol 1995;7:634-640. 

[3] Cavallo R, Rubenstein D, Peiter M, Curr Opin Genet Dev 1997;7:459-466. 
[4] Su LK, Vogelstein B, Kinzler KW, Science 1993;262:1734-1737. 
[5] Masiarz FR, Munemitsu S, Polakis P Science 1993;262:1731-1734 
25 [6] Peifer M, Wieschaus E, Cell 1990;63:1167-1176. 

32. (Asn Synthase) 
Asparagine synthase 

[0221] This family is always found associated with GATase 2. Members of this family catalyse the conversion of 
aspartate to asparagine. 

33. Asparaginase^ 
Asparaginase 12 members 

34. (Aspartyl tRNA N) 

<to Aminoacyl-transfer RNA synthetases class-ll signatures 

[0222] Aminoacy l-tRN A synthetases (EC 6. 1 . 1 .-) [ 1 ] are a group of enzymes which activate amino acids and transfer 
them to specific tRNA molecules as the first step in protein biosynthesis. In prokaryotic organisms there are at least 
twenty different types of aminoacyl-tRNA synthetases, one for each different amino acid. In eukaryotes there are gen- 

45 erally two aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a mitochondrial form 
While all thoso enzymes have a common function, they are widely diverse in terms of subunit size and of quaternary 
structure The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, 
proline serine, and threonine are referred to as class-ll synthetases [2 to 6) and probably have a common folding 
pattern in their catalytic domain for the binding of ATP and amino acid which is different to the Rossmann fold observed 

so for the class I synthetases [7]. Class-ll tRNA synthetases do not share a high degree of similarity, however at least 
three conserved regions are present [2,5,8]. Signature patterns have been derived from two of these regions. 
Consensus pattern: [FYH]-R-x-[DE]-x(4,12)-[RH]-x(3)-F-x(3)-[DE] 

Consensus pattern: [GSTALVF]-{DENQHRKP}-[GSTAh[LIVMF]-[DE]-R-[LIVMF]-x-[LIVMSTAG]-[LIVMFY] 

55 [1] Schimmel P. Annu. Rev. Biochem. 56:125-158(1987). 

[ 2] Delarue M., Moras D. BioEssays 15:675-687(1993). 
[ 3] Schimmel P. Trends Biochem. Sci. 16:1-3(1991). 

[ 4] Nagel G.M., Doolittle R.F. Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991). 
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[ 5) Cusack S., Haertlein M. ( Leberman R. Nucleic Acids Res. 19:3489-3498(1991). 
[6] Cusack S. Biochimie 75:1077-1081(1993). 

[ 7) Cusack S., Berthet-Colominas C., Haertfoin M,, Nassar N., Leborman R. Nature 347:249-255(1990). 
[ 8] Leveque F., Plateau P., Dessen P., Blanquet S. Nucleic Acids Res. 18:305-312(1990). 

5 

35. (ArfGap) Putative GTP-ase activating protein for Art. Putative zinc fingers with GTPase activating proteins (GAPs) ^ 
towards the small GTPase, Art. The GAP of ARD1 stimulates GTPase hydrolysis for ARD1 but not ARFs. Number of 
members: 34 

io [0223] 

[1]Medline: 96324970. Identification and cloning of centaurin-alpha. A novel phosphatidylinositol 3,4,5-trisphos- 
phate-binding protein from rat brain. Hammonds-Odie LP, Jackson TR, Profit AA, Blader IJ, Turck CW, Prestwich - 
GD, Theibert AB; J Biol Chem 1996;271:18859-18868. 
'5 [2]Medline: 97296423. A target of phosphatidylinositol 3,4,5-trisphosphate with a zinc finger motif simitar to that 

of the ADP-ribosylation -factor GTPase-activating protein and two pleckstrin homology domains. Tanaka K, Imajoh- 
Ohmi S, Sawada T.'Shirai R, Hashimoto Y, Iwasaki S, Kaibuchi K ( Kanaho Y, Shirai T, Terada Y, Kimura K, Nagata 
S, Fukui Y; Eur J Biochem 1 997;245:51 2-51 9. 

[3] 98112795. Molecular characterization of the GTPase-activating domain of ADP-ribosylation factor domain pro- 
20 tein 1 (ARD1). Vitale N, Moss J, Vaughan M; J Biol Chem 1998;273:2553-2560. 

36. Apolipoprotein. Apolipoprotein A1/A4/E family. This family includes: Swiss:P02647 Apolipoprotein A-l. Swiss: 
P06727 Apolipoprotein A-IV. Swiss: P02649 Apolipoprotein E. These proteins contain several 22 residue repeats which 
form a pair of alpha helices. Number of members: 42 

25 

[0224] [1]Medline: 91289138. Three-dimensional structure of the LDL receptor-binding domain of human apolipo- 
protein E. Wilson C, Warded MR, Weisgraber KH, Mahley RW, Agard DA; Science 1991;252:1817-1822. 

37. Amino acid permeases signature 

30 

[0225] Amino acid permeases are integral mombrane proteins involved in the transport of amino acids into tho coll. 
A number of such proteins have been found to be evolutionary related [1,2,3]. These proteins are: - Yeast general 
amino acid permeases (genes GAP1, AGP2 and AGP3). - Yeast basic amino acid permease (gene ALP1). - Yeast 
Leu/Val/lle permease (gene BAP2). - Yeast arginine permease (gene CAN1). - Yeast dicarboxylic amino acid permease 

35 (gene DIPS). - Yeast asparagine/glutamine permease (gene AGP1 ). - Yeast glutamine permease (gene GNP1 ). - Yeast 
histidine permease (gene HIP1 ). - Yeast lysine permease (gene LYP1 ). - Yeast proline permease (gene PUT4). - Yeast 
valine and tyrosine permease (gene VAL1/TAT1). - Yeast tryptophan permease (gene TAT2/SCM2). - Yeast choline 
transport protein (gene HNM1/CTR1). - Yeast GABA permease (gene UGA4). - Yeast hypothetical protein YKL174c. 
- Fission yeast protein isp5. - Fission yeast hypothetical protein SpAC8A4.11 - Fission yeast hypothetical protein 

to SpAC11 D3.08c. - Emericella nidulans proline transport protein (gene pmB). - Trichoderma harzianum amino acid per- 
mease INDA1. - Salmonella typhimurium L-asparagine permease (gene ansP). - Escherichia coli aromatic amino acid 
transport protein (gene aroP). - Escherichia coli D-serine/D-alanine/glycine transporter (gene cycA). - Escherichia coli 
GABA permease (gene gabP). - Escherichia coli lysine-specific permease (gene lysP). - Escherichia coli phenylalanine- 
specific permease (gene pheP). - Salmonella typhimurium prolino-spocilic pormoaso (gone proY). - Escherichia coll 

45 and Klebsiella pneumoniae hypothetical protein yeeF. - Escherichia coli and Salmonella typhimurium hypothetical pro- 
tein yifK. - Bacillus eubtitis permeases rocC and rocE which probably transports arginine or ornithino. Thoso protoins 
seem to contain up to 12 transmembrane segments. As a signature for this family of proteins, the best conserved 
region which is located in the second transmembrane segment has been selected. 

Consensus pattern: [STAGC]-G-[PAG]-x(2,3)-[LIVMFYWA](2)-x-(LIVMFYW]-x-fLIVMFWSTAGC](2)-[STAGC]-x(3)- 
S0 [LIVMFYWT]-x-ILIVMST]-x(3)- [LIVMCTA]-[GA]-E-x(5)-|PSAL]- 

[ 1] Weber E., Chevalier M.R., Jund R. J. Mol. EvoL 27:341-350(1988), 
[ 2] Vandenbol M. ( Jauniaux J.-C, Grenson M. Gene 83:153-159(1989). 

[ 3] Reizer J., Finley K. P Kakuda D., McLood C.L., Roizer A., Saior M.H, Jr. Protein Sci. 2:20-30(1993). 

65 

38. aakinase (1) Glutamate 5-kinase signature I 
[0226] Glutamate 5-kinase (EC 2.7.2.11 ) (gamma-glutamyl kinase) (GK) is the enzyme that catalyzes the first step 
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in Iho biosynthosis of proline from glutamate, the ATP-dependent phosphorylation of L-glutamate into L-glutamate 
5-phosphate. In eubacteria (gene proB) and yeast [1] (gene PR01), GK is a monofunctional protein, whil in plants 
mid m/mWrmln, it la n bllunclionnl onzymo (P5CS) |2|lhnl conoiolo ol two domains: a N-torminal GK domain and a C- 
torminal gamma-glutamyl phosphate reductase domain (EC 1.2.1.41 ) (see <PDOC00940>).As a signature pattern, a 
5 highly conserved glycine-and alanine-rich region located in the central section of these enzymes has been selected. 
Yoast hypotholical prolein YHR033w is highly similar to GK. 

Consensus pattern: [GSTN]-x(2)-G-x-GMGC]-[IM]-x-(STA]-K-[LIVM]-x*[SA]-[TCA]-x(2)-[GALV]-x(3)-G- 

| 1] Li W., Brandriss M.C. J. Bacterid. 174:4148-4156(1992). 
io [ 2] Hu C.-A.A,, Delauney A.J., Vorma DP.S. Proc. Natl. Acad. Sci. U.S.A. 89:9354-9358(1992). 

uoklnaoo (2) AGpartoklnaso slgnaluro 

[0227] Aspartokinaso (EC 2.7.2.4 ) (AK) [1] catalyzes the phosphorylation of aspartate. The product of this reaction 
is can then be used in the biosynthesis of lysine or in the pathway leading to homoserine, which participates in the 
biosynthesis of threonine, isoleucine and methionine. In Escherichia coli, there are three different isozymes which differ 
in their sensitivity to repression and inhibition by Lys, Met and Thr. AK1 (gene thrA) and AK2 (gene metL) are bifunctional 
onzymoc which both consist of an N- torminal AK domain and a C-torminal homoserine dehydrogenase domain. AK1 
is involved in threonine biosynthesis and AK2, in that of methionine. The third isozyme, AK3 (gene lysC), is monotunc- 
20 tional and involved in lysine synthesis. In yeast, there is a single isozyme of AK (gene HOM3). As a signature pattern 
for AK, a conserved region located in the N-terminal extremity has been selected. 
Consensus pattern: [LIVM]-x-K-[FY)-G-G-[ST]-[SC]-[LIVM]- 
| 1] Ralalski J.A., Falco S.C. J. Biol. Chom. 263:2146-2151(1988). 

2S aakinase (3) Gamma-glutamyl phosphate reductase signature 

[0228] Gamma-glutamyl phosphate reductase (EC 1.2.1.41 ) (GPR) is the enzyme that catalyzes the second step in 
the biosynthesis of proline from glutamate, the NADP-dependent reduction of L-glutamate 5-phosphate into L-gluta- 
mate 5-semialdehyde and phosphate. In eubacteria (gene proA) and yeast (1] (gene PR02), GPR is a monofunctional 
30 protein, while in plants and mammals, it is a bifunctional enzyme (P5CS) [2]that consists of two domains: a N-terminal 
glutamate 5-kinase domain(EC 2.7.2.11 ) (see < PPOC00701 >) and a C-terminal GPR domain. As a signature pattern, 
a conserved region that contains two histidine residues has been selected. This region is located in the last third of GPR. 
Consensus patlern: V-x(5)-A-[LlV]-x-H-l-x(2)-[HY]-[GS]-[ST]-x-H-[ST]-[DE]-x- I- 

35 [ 1] Pearson B.M., Hernando Y., Payne J. , Wolf S.S., Kalogeropoulos A., Schweizer M. Yeast 12:1021-1031(1996). 

[ 2] Hu C.-A.A., Delauney A.J., Verma D PS. Proc. Natl. Acad. Sci. U.S.A. 89:9354-9358(1992). 

39. (abhydrolaso) alpha/bola hydrolase (old. This catalytic domain Is tound in a very wido range of enzymes. ' 

40 [0229] [1] Ollis DL, Cheah E, Cygler M, Dijkstra B, Frolow F, Franken SM, Harel M, Remington SJ, Silman I, Schrag 
J, Sussman JL, Verschueren KHG, Goldman A, Protein Eng 1992;5:197-211. 

40. (Acid phosphat) Histidine acid phosphatases signatures 

45 [0230] Acid phosphatases (EC 3.1.3.2) are a heterogeneous group of proteins that hydrolyze phosphate esters, 
optimally at low pH. It has been shown [1} that a number of acid phosphatases, from both prokaryotes and eukaryotes, 
share two regions of sequence similarity, each centered around a conserved histidine residue. These two histidines 
seem to be involved in the enzymes' catalytic mechanism [2,3], The first histidine is located in the N-terminal section 
and louna n phoophohlolldlno Inloimodlalo whilo the second lo locntod in Iho C- terminal soction and possibly acts as 

so proton donor. Enzymes belonging to this family are called 'histidine acid phosphatases' and are listed below: 

Escherichia coli pH 2.5 acid phosphatase (gene appA). 
Escherichia coli glucose-1 -phosphatase (EC 3.1.3.10) (gene agp). 
- ' Yeast constitutive and repressive acid phosphatases (genes PH03 and PHOS). 
55 - Fission yeast acid phosphatase (gene phol). 

Aspergillus phytases A and B (EC 3.1.3.8) (gene phyA and phyB). 
Mammalian lysosomal acid phosphatase. 
Mammalian prostatic acid phosphatase. 
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in the biosynthesis of D- alanyl-lipoteichoic acid. - Human follicular variant translocation protein 1 (FVT1); - Mouse 
adipocyte protein p27. - Mouse protein Ke 6. - Maize sex determination protein TASSELSEED 2. - Sarcophaga pere- 
grina 25 kd development specific protein. - Drosophila fat body protein P6. - A Listeria monocytogenes hypothetical 
protoin oncodod in tho intornalins gono rogion. - Escherichia coli hypothetical protein yciK. - Escherichia coli hypothet- 

s ical piotoin ydlG, - Escherichia coli hypothetical protoin yjgl. - Escherichia coli hypothetical protein yjgU. - Escherichia 
coli hypothetical protein yohF. - Bacillus subtilis hypothetical protein yoxD. - Bacillus subtilis hypothetical protein ywfD. 
■ Bacillus subtilis hypothetical protein ywfH. - Yeast hypothetical protein YIL1 24w. - Yeast hypothetical protein YIR035C. 
- Yeast hypothetical protein YIR036C. - Yeast hypothetical protein YKL055C. - Fission yeast hypothetical protein 
SpAC23D3,l1, One of tho best conserved regions which includes two perfectly conserved residues, a tyrosine and a 

w lysino has been used as a signature pattern lor this tamily of proteins. The tyrosine residue participates in the catalytic 
mechanism. 

Consensus pattern: [LIVSPADNK]-x(12)-Y-[PSTAGNCV]-[STAGNQCIVM]-[STAGC]- K- {PC}-[SAGFYRHLIVM- 
STAGD]-x<2)-[LIVMFYW]-x(3)- [LIVMFYWGAPTHQ]- [GSACQRHM] [Y is an active site residue] 

is [1] Joernvall K, Persson B., Krook M., Atrian S., Gonzalez-Duarte R. t Jeffery J., Ghosh D. Biochemistry 34: 

6003-6013(1995). 

[ 2] Villarroya A., Juan E., Egestad B., Joernvall H Eur. J. Biochem. 180:191-197(1989). 
[ 3] Persson B., Krook M., Joernvall H. Eur. J. Biochem. 200:537-543(1991). 

[ 4] Neidle E.L., HarlnettC, Ornslon N.L, Bairoch A., Rekik M. ( Harayama S. Eur. J. Biochem. 204:113-120(1992). 

20 

[0238] 46. (adrwinc) Zinc-containing alcohol dehydrogenases signatures Alcohol dehydrogenase (EC 1.1.1.1) 
(ADH) catalyzes the reversible oxidation of ethanol to acetaldehyde with the concomitant reduction of NAD [1 ]. Currently 
three, structurally and catalytically, different types of alcohol dehydrogenases are known: - Zinc-containing 'long-chain' 
alcohol dehydrogenases. - Insect-type, or 'short-chain* alcohol dehydrogenases. - Iron-containing alcohol dehydroge- 

25 nases. Zinc-containing ADH's (2.3] are dimeric or tetrameric enzymes that bind two atoms of zinc per subunit. One of 
the zinc atom is essential for catalytic activity while the other is not. Both zinc atoms are coordinated by either cysteine 
or histidine residues; the catalytic zinc is coordinated by two cysteines and one histidine. Zinc-containing ADH's are 
found in bacteria, mammals, plants, and in fungi. In most species there are more than one isozyme (for example, 
human have at least six isozymes, yeast have three, etc.). A number of other zinc-dependent dehydrogenases are 

30 closely related to zinc ADH [4], these are: - Xylitol dehydrogenase (EC 1.1.1.9 ) (D-xylulose reductase). - Sorbitol de- 
hydrogenase (EC 1.1..1.14 ). - Aryl-alcohol dehydrogenase (EC 1.1.1.90 ) (benzyl alcohol dehydrogenase). - Threonine 
3-dohydrogenase (EC 1 .1 .1 .103 ). - Cinnamyl-alcohol dehydrogenase (EC 1 .1 .1 .195 ) (CAD) |5]. CAD is a plant enzyme 
invotvod in tho biosynthesis ol lignin. - Gatactitol-1 -phosphate dehydrogenase (EC 1.1.1.251 ). - Pseudomonas putida 
5-exo-alcohol dehydrogenase (EC 1.1.1.-) [6]. - Escherichia coli starvation sensing protein rspB. - Escherichia coli 

35 hypothetical protein yjgB. - Escherichia coli hypothetical protein yjgV. - Escherichia coli hypothetical protein yjjN. - Yeast 
hypothetical protein YAL060w (FUN49). - Yeast hypothetical protein YAL061 w (FUN50). - Yeast hypothetical protein 
YCR105w. The pattern that has been developed to detect this class ol enzymes is based on a conserved region that 
includes a histidine residue which is the second ligand of the catalytic zinc atom. This family also includes NADP- 
dependent quinone oxidoreductase (EC 1.6.5.5 ), an enzyme found in bacteria (gene qor), in yeast and in mammals 

40 where, in some species such as rodents, it has been recruited as an eye lens protein and is known as zeta-crystallin 
[7]. The sequence of quinone oxidoreductase is distantly related to that other zinencontaining alcohol dehydrogenases 
and it lacks the zinc-ligand residues. The torpedo fish and mammiian synaptic vesicle membrane protein vat-1 is related 
to qor. A specific pattern has been developed for this subfamily. 
Consonsus patlorn: G-H-E-x(2)-G-x(5)-|GA]-x(2)-|IVSAC] |H is a zinc ligand] 

45 Consensus pattern: [GSD]-[DEQH]-x(2)-L-x(3)-[SA](2)-G-G-x-G-x(4)-Q-x(2)-[KR]- 

[ 1] Branden C.-L, Joernvall H., Eklund H., Furugren B. (In) The Enzymes (3rd edition) 11:104-190(1975). 
[ 2] Joernvall H., Persson B. t Jeffery J. Eur. J. Biochem. 167:195-201(1987). 
[ 3] Sun H.-W., Plapp B.V. J. Mol. Evol. 34:522-535(1992). 
so [ 4] Persson B., Hallborn J., Walfridsson M., Hahn-Haegerdal B., Keraenen S., Penttilae M., Joernvall H. FEBS 

Lett. 324:9-14(1993). 

[ 5] Knight M.E., Halpin C. ( Schuch W. Plant Mol. Biol. 19:793-801(1992). 
~ [ 6] Koga H., Aramaki H., Yamaguchi E., Takeuchi K., HoriuchiT, Gunsalus I.C. J. Bacterid. 166:1089-1095(1986). 
' [ 7] Joernvall H.. Persson B., Du Bois G., LaversG.C, Chen J.H., Gonzalez R t RaoP.V., Zigler J.S. Jr: FEBS Lett. 
55 322:240-244(1993). 

[0239] 47. (aldedh) Aldehyde dehydrogenases active sites 

[0240] Aldehyde dehydrogenases (EC 1.2.1 .3 and EC 1.2.1.5 ) are enzymes which oxidize a wide variety of aliphatic 
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and aromatic aldehydes. In mammals at least four ditlerent forms of the enzyme are known [1]: class-1 (or Aid C) a 
tetrameric cytosolic enzyme, class-2 (or Aid M) a totramoric mitochondrial onzymo, class-3 (or Aid D) a dimoric cylosolic 
enzyme, and class IV a microsomal enzyme. Aldehyde dehydrogenases have also been sequenced from fungal and 
bacterial species. A number of enzymes are known to be evolutionary related to aldehyde dehydrogenases; these 

5 enzymes are listed below. - Plants and bacterial betaine-aldehyde dehydrogenase (EC 1.2.1.8 ) [2], an enzyme that 
catalyzes the last step in the biosynthesis of betaine.- Plants and bacterial NADP-dependent glyceraldehyde-3-phos- 
phate dehydrogenase (EC 1.2.1.9 ). - Escherichia coli succinate-semialdehyde dehydrogenase (NADP+) (EC 1.2.1.16 ) 
(gene gabD) [3]; which reduces succinate semialdehyde into succinate* • Escherichia coli lactaldehyde dehydrogenase 
(EC 1.2.1.22 ) (gene aid) [4]. - Mammalian succinate semialdehyde dehydrogenase (NAD+) (EC 1 .2.1.24 ). - Escherichia 

io coli phenylacetaldehyde dehydrogenase (EC 1.2.1.39 ). - Escherichia coli 5-carboxymethyl-2-hydroxymuconate sem- 
ialdehyde dehydrogenase (gene hpcC). - Pseudomonas putida 2-hydroxymuconic semialdehyde dehydrogenase [5] 
(genes dmpC and xylG), an enzyme in the meta-cleavage pathway for ihe degradation of phenols, cresots and catechol. 

- Bacterial and mammalian methylmalonate-semialdehyde dehydrogenase (MMSDH) (EC 1.2.1.27 ) [6], an enzyme 
involved in the distal pathway of valine catabolism. - Yeast delta-1-pyrroline-5-carboxylate dehydrogenase (EC 

15 1,5.1.12 ) [7] (gene PUT2), which converts proline to glutamate. - Bacterial multifunctional putA protein, which contains 
a delta-1-pyrroline- 5-carboxylate dehydrogenase domain. - 26G, a garden pea protein of unknown function which is 
induced by dehydration of shoots [8]. - Mammalian formyltetrahydrofolate dehydrogenase (EC 1.5.1.6 ) [9). This is a 
cytosolic enzyme responsible for the NADP-dependent decarboxylative reduction of 10-formy!tetrahydrofolate intotet- 
rahydrofolate. It is an protein of about 900 amino acids which consist of three domains; the C- terminal domain (480 

20 residues) is structurally and functionally related to aldehyde dehydrogenases. - Yeast hypothetical protein YBR006w. 

- Yeast hypothetical protein YER073w - Yeast hypothetical protein YHR039c. - Caenorhabditis elegans hypothetical 
protein F01F1.6.A glutamic acid and a cysteine residuo have boon implicated in tho catalytic activity of mammalian 
aldehyde dehydrogenase. These residues are conserved in all the enzymes of this family. Two patterns have been 
derived for this lamily, one for each of the active site residues. 

25 Consensus pattern: [LIVMFGA]-E-[LIMSTAC]-|GS]-G-[KNLM]-[SADN]-[TAPFV] [E is the active site residue]- 

Consensus pattern: [FYLVA]-x(3)-G-[QE)-x-C-[LIVMGSTANC]-[AGCN]-x-[GSTADNEKR] (C is the active site residue 

[ 1] Hempel J., Harper K., Lindahl R. Biochemistry 28:1160-1167(1989). 
[ 2] Weretilnyk E.A., Hanson A.D. Proc. Natl. Acad. Sci. U.S.A. 87:2745-2749(1990). 
30 [ 3] Niegemann E., Schulz A., Bartsch K. Arch. Microbiol. 160:454-460(1993). 

( 4] Hidalgo E., Chen Y.-M., Lin E.C.C., Aguilar J. J. Bacteriol. 173:6118-6123(1991). 
[ 5] Nordlund I., Shingler V. Biochim. Biophys. Acta 1049:227-230(1990). 

[ 6] Steele M.I., Lorenz D., Hatter K., Park A., Sokatch J.R. J. Biol. Chem. 267:13585-13592(1992). 
[ 7] Krzywicki K.A., Brandriss M.C. Mol. Cell. Biol. 4:2837-2842(1984). 
35 [ 8] Guerrero F.D., Jones J.T., Mullet J.E. Plant Mol. Biol. 15:11-26(1990). 

[ 9] Cook R.J., Lloyd R.S., Wagner C. J. Biol. Chem. 266:4965-4973(1991). 

[0241] 48. Aldo/keto reductase family signatures 

The aldo-keto reductase family [1 ,2] groups together a number of structurally and functionally related NADPH-depend- 
40 ent oxidoreductases as well as some other proteins. The proteins known to belong to this family are: - Aldehyde re- 
ductase (EC 1.1.1.2 ). - Aldose reductase (EC 1.1.1.21 ). - 3-alpha-hydroxysteroid dehydrogenase (EC 1.1.1.50 ), which 
terminates androgen action by converting 5-alpha-dihydrotestosterone to 3-alpha-androstanediol. - Prostaglandin F 
synthase (EC 1.1.1,1 88 ) which catalyzos tho rocluction of prostaglandins H2 and D2 to F2-nlpha. - D-norbltol-G*pho&- 
phate dehydrogenase (EC 1.1.1.200 ) from apple. - Morphine 6-dehydrogenase (EC 1.1.1.218 ) from Pseudomonas 
^5 putida plasmid pMDH7.2 (gene morA). - Chlordecone reductase (EC 1.1.1.225 ) which reduces the pesticide chlo- 
rdecone (kepone) to the corresponding alcohol. - 2,5-diketo-D-gluconic acid reductase (EC 1.1.1.-) which catalyzos 
the reduction of 2.5-diketogluconic acid to 2-keto-L-gulonic acid, a key intermediate in the production of ascorbic acid. 

- NAD(P)H-dependont xylose reductase (EC 1.1.1.-) from tho yoast Pichia stipitis. This onzymo roducos xylose into 
xylit. - Trans-1,2-dihydrobenzene-1,2-diol dehydrogenase (EC 1.3.1.20 ). - 3-oxo-5-beta-sterold 4-dohydrogenase (EC 

50 1.3.99.6 ) which catalyzes the reduction of delta(4)-3-oxosteroids. - A soybean reductase, which co-acts with chalcone 
synthase in the formation of 4,2\4'-trihydroxychalcone. - Frog eye lens rho crystallin. - Yeast GCY protein, whoso 
function is not known. - Leishmania major P110/11E protein. P1 10/11 E is a developmental^ regulated protein whose 
abundance is markedly elevated in promastigotes compared with amastigotes. Its exact function is not yet known. - 
Escherichia coli hypoth tical protein yafB. - Escherichia coli hypothetical protein yghE. - Yoast hypothetical protoin 

55 YBR149W. - Yeast hypothetical protein YHRl04w. - Yeast hypothetical protoin YJR096w,Thoso proteins havo all about 
300 amino acid residues. Three consensus patterns have been developed that are specific to this family of protdins. 
The first one is located in the N-terminal section of these proteins. The second pattern is located in the central 6oction. 
The third pattern, located in the C-terminal, is centered on a lysine residue whose chemical modification, in aldose and 
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flld§hyd§r§cluclfl§ee, affecl Ihe cnlalylic odiciency, 

Consensus pattern: G-|FY]-R-|HSAL]-|LIVMF]-D-|STAGC]-|AS]-x(5)-E-x(2)-|LIVMJ- G- 
cu, lb ur.bi.b pHllHiiV. |UIVMI'V|=M(«HI<I^SI'K-|UIVMhO-|LIVMHBC:i-N-tFY|. 

Consensus pattern: |LIVM]-|PAIV]-[KR]-|ST]-x(4)-R-x(2)-IGSTAEQK]-|NSL]-x(2)-|LIVMFA] [K is a putative active site 
5 residuej- 

| 1] Bohren K.M.. Bullock B.. Wermuth B., Gabbay K.H. J. Biol. Chem. 264:9547-9551(1989). 
| 2] Bruce N.C., Willey D.L.. Coulson A.F.W., Jettery J. Biochem: J. 299:805-811(1994). 

io [0242] 49 Alpha amylase. This lamily is classified as family 1 3 of the glycosyl hydrolases. The structure is an 8 
stranded alpha/beta barrel, interrupted by a -70 a.a. calcium-binding domain protruding between beta strand 3 and 
alpha helix 3, and a carboxyl-terminal Greek key beta-barrel domain. 

(1] Larson SB. Greenwood A, Cascio D, Day J. McPherson A, J Mol Biol 1994;235:1560-1584. 
[0243] 50. Aminotransferases class-l pyridoxal-phosphate attachment site 

zs Aminotransferases share certain mechanistic foatures with other pyridoxal- phosphate dependent enzymes, such as 
the covalent binding of the pyridoxal- phosphate group to a lysine residue. On the basis of sequence similarity, these 
various enzymes can be grouped [1 ,2] into subfamilies. One of these, called class-l, currently consists of the following 
enzymes - Aspartate aminotransferase (AAT) (EC 2.6.1.1 ). AAT catalyzes the reversible transfer of the ammo group 
lrom L-aspartate to 2-oxoglularate to form oxaloacetate and L-glutamate. In eukaryotes, there are two AAT isozymes: 

so ono is located in Ihe mitochondrial matrix, the second is cytoplasmic. In prokaryotes, only one form of AAT is found 
(gone aspC). - Tyrosine aminotransferase (EC 2.6.1.5 ) which catalyzes the first step in tyrosine catabol.sm by reversibly 
transferring its amino group to 2- oxoglutarate to form 4-hydroxyphenylpyruvate and L-glutamate. - Aromatic ami- 
notransferase (EC 2 6 1.57 ) involved in the synthesis of Phe, Tyr. Asp and Leu (gene tyrB). - 1 -aminocyclopropane- 
1-carboxylate synthase (EC 4.4.1.14 ) (ACC synthase) from plants. ACC synthase catalyzes the first step in ethylene 

2s biosynthesis - Pseudomonas denitrificans cobC. which is involved in cobalamin biosynthesis. - Yeast hypothetical 
protein YJL060w.The sequence around the pyridoxal-phosphate attachment site of this class of enzyme is sufficiently 
conserved to allow the creation of a specific pattern. 

Consensus pattern: [GS]-[LIVMFYTAC]-|GSTA]-K-x(2)-[GSALVN)-|LIVMFA]-x-[GNAR]- x-R-[LIVMA]-[GA] [K .s the py- 
ridoxal-P attachment site] 

30 

[ 1] Bairoch A. Unpublished observations (1992). 

[ 2] Sung M.H., Tanizawa K., Tanaka H., Kuramitsu S.. Kagamiyama H., Hirotsu K.. Okamoto A., Higuchi T, Soda 
K. J. Biol. Chem. 266:2567-2572(1991). 

35 [0244] 51 Aminotransferases class-ll pyridoxal-phosphate attachment site 

Aminotransferases share certain mechanistic features with other pyridoxal- phosphate dependent enzymes, such as 
the covalent binding of the pyridoxal- phosphate group to a lysine residue. On the basis of sequence similarity, these 
various enzymes can be grouped [1] into subfamilies. One of these, called class-ll, currently consists of the following 
enzymes- - Glycine acetyltranslerase (EC 2.3.1.29 ). which catalyzes the addition of acetyl-CoA to glycine to form 

io 2-amino-3-oxobu1anoate (gene kbl). - 5-aminolevulinic acid synthase (EC 2.3.1.37) (delta-ALA synthase), which cat- 
alyzes the first step in heme biosynthesis via the Shemin (or C4) pathway, i.e. the addition of succmyl-CoA to glycine 
to form 5- aminolevulinate. - 8-amino-7-oxononanoate synthase (EC 2.3.1.47 ) (7-KAP synthetase), a bacterial enzyme 
(gene bioF) which catalyzes an intermediate step in the biosynthesis of biotin: the addition of 6-carboxy-hexanoyl-CoA 
to alanine to form 8-amino-7-oxononanoate. - Histidinol-phosphate aminotransferase (EC 2.6.1.9). which catalyzes 

as the eighth step in histidine biosynthetic pathway: the transfer of an amino group from 3-(imidazol-4-yl)-2-oxopropyl 
phosphate to glutamic acid to form histidinol phosphate and 2-oxoglutarate. - Serine palmitoyltransferase (EC 2.3.1.50 ) 
from yeast (genes LCB1 and LCB2), which catalyzes the condensation of palmitoyl-CoA and serine to form 3-ketosph- 
inganine.The sequence around the pyridoxal-phosphate attachment site of this class of enzyme is sufficiently con-, 
sorvod to Allow tho croalion of a spocific pattorn 

so Consensus pattern: T-[LIVMFYW]-|STAG)-K-[SAGl-[LIVMFYWR]-|SAG]-x(2)-[SAG] 

[K is the pyridoxal-P attachment site]- 

[ 1] Bairoch A. Unpublished observations (1991). 

[0245] 52. Aminotransferases class-Ill pyridoxal-phosphate attachment site 

Aminolmnslorasos share cortain mechanistic features with other pyridoxal- phosphate dependent enzymes, such as 
ss the covalent binding of the pyridoxal- phosphate group to a lysine residue. On the basis of sequence similarity, these 
various enzymes can be grouped [1 .2] into subfamilies. One of these, called class-Ill, currently consists of the following 
enzymes: - Acetylornithine aminotransferase (EC 2 i &m) which catalyzes the transfer of an amino group from acety- 
lornithine to alpha-ketoglutarate, yielding N-acetyl-glutamic-5-semi-aldehyde and glutamic acid. - Ornithine ami- 
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notransferase (EC 2.6.1.13 ), which catalyzes the transfer of an amino group from ornithine to alpha-ketoglutarate, 
yielding glutamic-5- semi-aldehyde and glutamic acid. • Omega-aminoacid-pyruvate aminotransferase (EC 2.6.1 .18 ), 
which catalyzes transamination between a variety of omega-amino acids, mono- and diamines, andpyruvate. It plays 
a pivotal role in omega amino acids metabolism. - 4-aminobutyrate aminotransferase (EC 2.6.1 .19 ) (GABA transami- 

5 nase), which catalyzes the transfer of an amino group from GABA to alpha-ketoglutarate, yielding succinate semial- 
dehyde and glutamic acid. - DAPA aminotransferase (EC 2.6.1.62 ), a bacterial enzyme (gene bioA) which catalyzes 
an intermediate step in the biosynthesis of biotin, the transamination of 7-keto-8-aminopelargonic acid (7-KAP) to form 
7,8- diaminopelargonic acid (DAPA). - 2,2-dialkylglycine decarboxylase (EC 4.1.1.64 ), a Pseudomonas cepacia en- 
zyme (gene dgdA) that catalyzes the decarboxylating amino transfer ol 2,2-dialkylglycine and pyruvate to dialkyl ketone, 

io alanine and carbon dioxide. - Glutamate-1-semialdehyde aminotransferase (EC 5.4.3.8 ) (GSA). GSA is the enzyme 
involved in the second step of porphyrin biosynthesis, via the C5 pathway. It transfers the amino group on carbon 2 of 
glutamate-1 -semialdehyde to the neighbouring carbon, to give delta-aminolevulinic acid. - Bacillus subtilis aminotrans- 
ferase yhxA. - Bacillus subtilis aminotransferase yodT. - Haemophilus influenzae aminotransferase HI0949. - 
Caenorhabditis elegans aminotransferase T0lBl1.2.The sequence around the pyridoxal-phosphate attachment site 

*5 of this class ofenzyme is sufficiently conserved to allow the creation of a specific pattern. 
Consensus pattern: [LIVMFYWC](2)-x-D-E-[IVA]-x(2)-G^ 

FC]-[LI VMFYSTA]-x(2)-~ [GSA]-K-x(3)-[GSTADNV]-[GSAC] [K is the pyridoxal-P attachment site]- 

[ 1) Batroch A, Unpublished observations (1992). [ 2] Yonaha K., Nishie M., Aibara S. J. Biol. Chem. 267:12506-12510 

(1992). 

20 [0246] 53. Ank repeat. There's no clear separation between noise and signal on tho HMM soarch Ankyrln ropoatB 
generally consist of a beta t alpha, alpha, beta order of secondary structures. The repeats associate to form a higher 
order structure. 

[1] A, HolakTA, FEBS Lett 1997;401:127-132. 
2S [2] Lux SE ( John KM, Bonnott V ( Naturo 1990;345:736-739. 

[0247] 54. Aminotransferases class-IV signature 

[0248] Aminotransferases share certain mechanistic features with other pyridoxal-phosphate dependent enzymes, 
such as the covalent binding of the pyridoxal-phosphate group to a lysine residue. On the basis of sequence similarity, 
30 these various enzymes can be grouped [1 ,2] into subfamilies. One of these, called class-IV, currently consists of the 
following enzymes: 

Branched-chain amino-acid aminotransferase (EC 2.6.1.42 ) (transaminase B), a bacterial (gene ilvE) and eukary- 
otic enzyme which catalyzes the reversible transfer of an amino group from 4-methyl-2-oxopentanoate to gluta- 
35 mate, to form leucine and 2-oxoglutarate. 

D-alanine aminotransferase (EC 2.6.1.21 ). A bacterial enzyme which catalyzes the transfer of the amino group 
from D-alanine (and other D-amino acids) to 2-oxoglutarato, to !orm pyruvato and D-aeparlalo. 
4-amino-4-deoxychorismate (ADC) lyase (gene pabC). A bacterial enzyme that converts ADC into 4-aminoben- 
zoate (PABA) and pyruvate. 

40 

[0249] The above enzymes are proteins of about 270 to 41 5 amino-acid residues that share a few regions of sequence 
similarity. Surprisingly, the best-conserved region does not include the lysine residue to which the pyridoxal-phosphate- 
group is known to be attached, in ilvE. The region that has been selected as a signature pattern is located some 40 
residues at the C-terminus side of the PIP-lysine 
45 Consensus pattern: E-x-[STAGCI]-x(2)-N-[LIVMFAC]-|FY^ 

[1] Green J.M., Merkel W.K., Nichols B.P. J. Bacteriol. 174:5317-5323(1992). 
[2] Bairoch A. Unpublished observations (1992). 

50 [0250] 55. Aminotransferases class-V pyridoxal-phosphate attachment site 

Aminotransferases share certain mechanistic features with other pyridoxal- phosphate dependent onzymos, such as 
the covalent binding of the pyridoxal- phosphate group to a lysine residue. On the basis of sequence similarity, these 
various enzymes can be grouped [1 ,2] into subfamilies. One of these, called class-V, currently consists of the following 
enzymes: - Phosphoserine aminotransferase (EC 2.6.1.52 ), an enzyme which catalyzos tho rovorsiblo inlorconvorsion 

55 of phosphoserine and 2-oxoglularate to 3-phosphonooxypyruvate and glutamate. It is required both in the major phos- 
phorylated pathway of serine biosynthesis and in pyridoxine biosynthesis. The bacterial enzyme (gene serC) is highly 
similar to a rabbit endometrial progesterone-induced protein (EPIP), which is probably a phosphoserine aminotrans- 
ferase [3]. - Serine-glyoxylate aminotransferase (EC 2.6.1.45 ) (SGAT) (gene sgaA) from Methylobacterium ex- 
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loiquons • Sorino-pyruvnlo nminolrflnelorflso (EC 2.6.1.51 ). This onzymo also acts as an alanme-glyoxylate ami- 
notransferase (EC 2 6 1.44 ).. In vertebrates, it is located in the peroxisomes and/or mitochondria. - Isopenicillin N 
oplmou.sb (qoiio cotD). Thio onzymo io Involved in the bioaynthoois of cephalosporin antibiotics and catalyzes the 
reversible isomerization ol isopenicillin N and penicillin N. - NHS. a protein of the nitrogen fixation operon of some 
5 bacteria and cyanobacteria. The exact function of nifS is not yet known. A highly similar protein has been found in fungi 
(gone NFS1 or SPL1). - The small subunit of cyanobacterial soluble hydrogenase (EC 1.12.-.-). - Hypothetical protein 
ycbU from Bacillus sublilis. - Hypothetical protein YFL030wfrom yeast. The sequence around the pyridoxal-phosphate 
attachment site of this class of enzyme is sufficiently conserved to allow the creation of a specific pattern. 
Consensus pattern: |LIVFYCHT]-|DGH]-|LIVMFYAC]-lLIVMFYA]-x(2)-|GSTAC]-|GSTA]- [HQR]-K-x(4,6)-G-x-|GSAT]- 

10 x-ILIVMFYSAC) |K is the pyridoxal-P attachment site]- 

| 1] Ouzounis C. Sander C. FEBS Lett. 322:159-164(1993). 
[ 2) Bairoch A. Unpublished observations (1992). 

[ 3] van der Zel A., Lam H.-M., Winkler M.E. Nucleic Acids Res. 17:8379-8379(1989). 


IB 


[0251] 56. Annexins repeated domain signature ' 

Annexins [1 to 6] are a group of calcium-binding proteins that associate reversibly with membranes. They bind to 
phospholipid bilayers in the presence of micromolar free calcium concentration. The binding is specific for calcium and 
for acidic phospholipids. Annexins have been claimed to be involved in cytoskeletal interactions, phospholipase inhi- 

so bilion intracellular signalling, anticoagulation, and membrane fusion. Each of these proteins consist of an N-termmal 
domain of variable length followed by four or eight copies of a conserved segment of sixty one residues. The repeat 
(sometimes known as an 'endonexin fold) consists of five alpha-helices that are wound into a right-handed superhelrx 
[7] The proteins known to belong to the annexin family are listed below: - Annexin I (Lipocortin 1) (Calpactin 2) (p35) 
(Chromobindin 9). - Annexin II (Lipocortin 2) (Calpactin 1) (Protein I) (p36) (Chromobindin 8). - Annexin III (Lipocortin 

ps 3) (PAP-III) - Annexin IV (Lipocortin 4) (Endonexin I) (Protein II) (Chromobindin 4). ■ Annexin V (Lipocortin 5) (Endon- 
oxln 'A) (VAC-nlphn) (Anchoiln Cll) (PAP-I). - Annoxln VI (Lipocortin G) (Piololn III) (Chromobindin 20) ( P 68) (p70). This 
is the only known annexin that contains 8 (instead of 4) repeats. - Annexin VII (Synexin). - Annexin VIII (Vascular 
anticoagulant-beta) (VAC-beta). - Annexin IX from Drosophila. - Annexin X from Drosophila. - Annexin XI (Calcyclin- 
associated annexin) (CAP-50). - Annexin XII from Hydra vulgaris. - Annexin XIII (Intestine-specific annexin) (ISA).The 

30 signature pattern for this domain spans positions 9 to 61 ot the repeatand includes the only perfectly conserved residue 

Consensus 0 "ST 2 |TG]-|STV]-x(8)-|LIVMF]-x(2)-R-x(3)-[DEQNH].x(7).[IFY]- x(7)-[LIVMF]-x(3)-[LlVMF]-x(11)- 
[LIVMFA]-x(2)-|LIVMF]- 

35 [ 1] Raynal P., Pollard H.B. Biochim. Biophys. Acta 1197:63-93(1994). 

[ 2] Barton G.J., Newman R.H., Freemont P.S.. Crumpton M.J. Eur. J. Biochem. 198:749-760(1991). 
[ 3] Burgoyno R.D.. Geisow M.J. Coll Calcium 10:1-10(1989). 

1 4] Haigler H.T., Fitch J.M., Jones J.M.. Schlaepler D.D. Trends Biochem. Sci. 14:48-50(1989). 
[ 5] Klee C.B. Biochemistry 27:6645-6653(1988). 
40 ( 6] Smith P.D., Moss S.E. Trends Genet. 10:241-246(1994). 

[ 7] Huber R., Roemisch J., Paques E.-P. EMBO J. 9:3867-3874(1990). 
[ 8] Fiedler K., Simons K. Trends Biochem. Sci. 20:177-178(1995). 

[02S2] 57 (arf_1) ADP-ribosylation factors family signature 

« ADP-ribosylation factors (ARF) [1 ,2,3,4] are 20 Kd GTP-binding proteins involved in protein trafficking. They may mod- 
ulate vesicle budding and uncoating within the Golgi apparatus. ARF's also act as allosteric activators of cholera toxin 
ADP-ribosyltransIerase activity. They are evolutionary conserved and present in all eukaryotes. At least six forms of 
ARF are present in mammals and three in budding yeast. The ARF family also includes proteins highly related to ARF's 
but which lack the cholera toxin cofactor activity, they are collectively known as ARL's (ARF-like).ARDI is a 64 Kd 

so mammalian protein of unknown biological function that contains an ARF domain at its C-terminal extremity. Proteins 
from the ARF family are generally included in the RAS 'superlamily' of small GTP-binding proteins [5], but they are 
only slightly related to the other RAS proteins. They also differ from RAS proteins in that they lack cysteine residues 
a-t their C-termini and are therefore not subject to prenylation. The ARFs are N-terminally myristoylated (the ARLs have 
not yet been shown to be modified in such a fashion). A conserved region in the C-terminal part of ARF's and ARL's 

55 has been selected as a signature pattern. 

Consensus pattern: [HRQT]-x-[FYWI]-x-[LIVM]-x(4)-A-x(2)-G-x(2)-ILIVM]-x(2)-[GSA]-[LIVMF]-x-[WK]-ILIVM]- 

Note: proteins belonging to this family also contain a copy of the ATP/GTP- binding motif 'A' (P-loop) (see <PDOC00017 
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[ 1] Boman A.L, Kahn R.A. Trends Biochem. Set. 20:147-150(1995). 
[ 2] Moss J. ( Vaughan M. C II. Signal. 4.367-399(1993). 
[ 3] Moss J., Vaughan M. Prog. Nucleic Acid Res. Mol. Biol. 45:47-65(1993). 
[4] Amor J.C., Harrison D.H., Kahn R.A., Ringe D. Nature 372:704-708(1994). 
5 [5] Valencia A. : Chardin P., Wittinghofer A., Sander C. Biochemistry 30:4637-4648(1991). 

[0253] (arf_2) ATP/GTP-binding site motif A (P-loop) 

From sequence comparisons and crystailographic data analysis it has boon shown [1,2,3,4,5,6] that an appreciable 
proportion of proteins that bind ATP or GTP share a number of more or less conserved sequence motifs. The best 

io conserved of these motifs is a glycine-rich region, which typically forms a flexible loop between a beta-strand and an 
alpha-helix. This loop interacts with one of the phosphate groups of the: nucleotide. This sequence motif is generally 
referred to as the 'A' consensus sequence [1] or the 'P-loop' [5].There are numerous ATP- or GTP-binding proteins in 
which the P-loop is found. A number of protein families for which the relevance of the presence of such motif has been 
noted are listed below: - ATP synthase alpha and beta subunits (see <PDOC00137 >). - Myosin heavy chains. - Kinesin 

J5 heavy chains and kinesin-like proteins (see <PDOC00343>). - Dynamins and dynamin-like proteins (see 
< PDOC00362 >). - Guanylate kinase (see <PDOC00670>). - Thymidine kinase (see <PDOC00524>). - Thymidylate 
kinase (see <PDOC01034 >). - Shikimate kinase (see < PDOC00868 >). - Nitrogenase iron protein. family (nifH/frxC) 
(see < PDOC00580 >). - ATP-binding proteins involved in 'active transport' (ABC transporters) [7] (see <PDOC00185 >). 
- DNAand RNAhelicases [8,9,10]. - GTP-binding elongation factors (EF-Tu, EF-1 alpha, EF-G, EF-2, etc.). - Ras family 

20 of GTP-binding proteins (Ras, Rho, Rab, Ral, Ypt1, SEC4, etc.). - Nuclear protein ran (see <PDOC00859>). - ADP- 
ribosylation factors family (see <PDOC00781 >). - Bacterial dnaA protein (see <PDOC00771 >). - Bacterial recA protein 
(see <PDOC00131>). - Bacterial recF protein (see <PDOC00539>). - Guanine nucleotide-binding proteins alpha sub- 
units (Gi, Gs, Gt, GO, etc.). - DNA mismatch repair proteins mutS family (See <PDOC00388>). - Bacterial type II 
secretion system protein E (see <PPOC00567 >). Not all ATP- or GTP-binding proteins are picked-up by this motif. A 

25 number of proteins escape detection because the structure of their ATP-binding site is completely different from that 
of the P-loop. Examples of such proteins are the E1 -E2 ATPases or the glycolytic kinases. In other ATP-or GTP-binding 
proteins the flexible loop exists in a slightly different form; this is the case for tubulins or protein kinases. A special 
mention must be reserved for adenylate kinase, in which there is a single deviation from the P-loop pattern: in the last 
position Gly is found instead of Ser or Thr. 

30 Consensus pattern: [AG]-x(4)-G-K-[ST]- 

[ 1] Walker J.E., Saraste M., Runswick M.J., Gay N.J. EMBO J. 1:945-951(1982). 
[ 2) Moller W., Amons R. FEBS Lett. 186:1-7(1985). 

[ 3] Fry D.C., Kuby S.A., Mildvan A. S. Proc. Natl. Acad. Sci. U.S.A. 83:907-911(1986). 
35 l 4] Dever T.E., Glynias M.J., Merrick W.C. Proc. Natl. Acad. Sci. U.S.A. 84:1814-1818(1987). 

[ 5] Saraste M., Sibbald PR., Wittinghofer A. Trends Biochem. Sci. 15:430-434(1990). 
[ 6] Koonin E.V J. Mol. Biol, 229:1165-1174(1993). 

[7] Higgins C.F., Hyde SC., Mimmack M.M., Gileadi U. ( Gill D.R., Gallagher M.P. J. Bioenerg. Biomembr. 22: 
571-592(1990). 

40 [ 8] Hodgman TC. Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata). 

[ 9] Linder P., LaskoP, Ashburner M., Leroy P, Nielsen P.J., Nishi K., Schnier J., Slonimski P.P. Nature 337:121-122 
(1989). 

[10] Gorbalonya A.E., Koonin E.V., Donchonko A. P., Blinov V,M, Nuclolc Acids Ros. 17:4713-4730(1009). 

45 [0254] 58. Arginase family signatures 

The following enzymes have been shown [1] to be evolutionary related: - Arginase (EC 3.5.3.1), a ubiquitous enzyme 
which catalyzes the degradation of arginine to ornithine and urea [2]. - Agmatinase (EC 3.5.3.11 ) (agmatine ureohy- 
drolase), a prokaryotic enzyme (gene speB) that catalyzes the hydrolysis of agmatino into putroscino and uroa. - 
Formiminoglulamaso (EC 3.5.3.8 ) (lormiminoglulHrnato hydrolfciso), a prokaryollc onzymo (gono hulG) thai hydroly/oti 

50 N-formimino-glutamate into glutamate and formamide. - Hypothetical proteins from methanogenic archaebacteria. 
These enzymes are proteins of about 300 amino-acid residues. Three conserved regions that contain charged residues 
which are involved in the binding of the two manganese ions [3] can be used as signature patterns. - 
Consensus pattern: [LIVMF]-G-G-x-H-x-[LIVMT]-[STAV]-x-[PAG]-x(3)-[GSTA] [H binds manganese]- 
Consensus pattern: [LIVM](2)-x-[LIVMFY]-D-[AS]-H-x-D [The two D's and tho H bind manganoso]- 

55 Consensus pattern: [ST]-[LIVMFY]-D-[LIVM]-D-x(3)-|PAQ]-x(3)-P-[GSA]-x(7)-G [The two D's bind manganoso] 

[ IJOuzounisC, Kyrpides N.C. J. Mol. Evol. 39:101-104(1994). 

[ 2] Jenkinson CP, Grody W.W., Cederbaum S.D. Comp. Biochem. Physiol. 114B: 107-1 32(1 96). 
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Pea abscisic acid-responsive proteins ABR17 and ABR18. 
Potato pathogenesis-related proteins STH-2 and STH-21 . 
Soybean stress-induced protein SAM22. 

5 [0267] These proteins are thought to be intracellularly located. They contain from 155 to 160 amino acid residues. 
As a signature pattern, a conserved region located in the third quarter of these proteins has been selected 
Consensus pattern: G-x(2)-[LIVMF]-x(4)-E-x(2)-[CSTAEN]-x(8,9)-[GND]-G-[GS]- [CS]-x(2)-K-x(4)-|FY]- 

|1] Broiteneder H. Pellonburgor K., Bito A. ( Valonta R., Kmll D., Rumpold H., Schoinor O,, Broltonbfich M. EMBO 
io J. 8:1935-1938(1989). 

[2] Crowell D., John M.E., Russell D., Amasino R.M. Plant Mol. Biol. 18:459-466(1992). 
[3] Warner S.A.J., Scott R., Draper J. Plant Mol. Biol. 19:555-561(1992). 

[0268] 68. bZIP transcription factors basic domain signature 

is The bZIP superfamily [1,2,] of eukaryotic DNA-binding transcription factors groups together proteins that contain a 
basic region mediating sequence-specific DNA-binding followed by a leucine zipper required for dimerization.- This 
family is quite large, therefore only a parital list of some representative members appears here. - Transcription factor 
AP-1, which binds selectively to enhancer elements in the cis control regions of SV40 and metallothionein HA. AP-1 , 
also known as c-jun, is the cellular homolog of the avian sarcoma virus 17 (ASV17) oncogene v-jun. - Jun-B and jun- 

20 D t probable transcription factors which are highly similar to jun/AP-1.- The fos protein, a proto-oncogono that forms a 
non-covalent dimer with c-jun. - The fos-related proteins fra-1 , and fos B. - Mammalian cAMP response element (CRE) 
binding proteins CREB, CREM, ATF-1, ATF-3, ATF-4, ATF-5, ATF-6 and LRF-1. - Maize Opaque 2, a trans-acting 
transcriptional activator involved in the regulation of the production of zein proteins during endosperm. - Arabidopsis 
G-box binding factors GBF1 to GBF4, Parsley CPRF-1 to CPRF-3, Tobacco TAF-1 and wheat EMBP-1. All these 

25 proteins bind the G-box promoter elements of many plant genes. - Drosophila protein Giant, which represses the 
expression of both the kruppel and knirps segmentation gap genes. - Drosophila Box B binding factor 2 (BBF-2), a 
transcriptional activator that binds to fat body-specific enhancers of alcohol dehydrogenase and yolk protein genes. - 
Drosophila segmentation protein cap'n'collar (gene cnc), which is involved in head morphogenesis. - Caenorhabditis 
elegans skn-1 , a developmental protein involved in the fate of ventral blastomeres in the early embryo. - Yeast GCN4 

30 transcription factor, a component of the general control system that regulates the expression of amino acid-synthesizing 
enzymes in response to amino acid starvation, and the related Neurospora crassa cpc-1 protein. - Neurospora crassa 
cys-3 which turns on the expression of structural genes which encode sulfur-catabolic enzymos. - Yeast MET28, a 
transcriptional activator of sulfur amino acids metabolism. - Yeast PDR4 (or YAP1), a transcriptional activator of the 
genes for some oxygen detoxification enzymes. - Epstein-Barr virus trans-activator protein BZLF1.- 

35 Consensus pattern: [KR]-x(1,3)-[RKSAQ]-N-x(2)-|SAQ](2)-x-|RKTAENO]-x-R-x-|RK]- 

[ 1] Hurst H.C, Protein Prof. 2:105-168(1995). [ 2] Ellenberger T. Curr. Opin. Struct. Biol. 4:12-21(1994). 
[0269] 69. Biotin-requiring enzymes attachment site 

Biotin, which plays a catalytic role in some carboxyl transfer reactions, is covalently attached, via an amide bond, to a 
lysine residue in enzymes requiring this coenzyme [1,2,3,4]. Such enzymes are: 

40 

Pyruvate carboxylase (EC 6.4.1.1). 

Acetyl-CoA carboxylase (EC 6.4.1 .2). 

Propionyl-CoA carboxylase (EC 6.4.1.3). 

Methylcrotonoyl-CoA carboxylase (EC 6.4. 1.4). 
45 - Geranoyl-CoA carboxylase (EC 6.4. 1.5). 

Uroa carboxylaoo (EC 6.3.4.6). 

Oxaloacetate decarboxylase (EC 4. 1 .1 .3). 

Methylmalonyl-CoA decarboxylase (EC 4.1.1.41). 

Glutaconyl-CoA decarboxylase (EC 4. 1 . 1 .70). 
so - Methylmalonyl-CoA carboxyl-transferase (EC 2.1.3.1) (transcarboxylase). Sequence data reveal that the region 

around the biocytin (biotin-lysine) residue is well conserved and can be used as a signature pattern. 

[0270] Consensus pattern[GN]-[DEQTR]-x-[LIVMFY]-x(2)-[LIVM]-x-[AIV]-M-K-[LMAT]-x(3)-[LIVM]-x-[SAV] [K is the 
biotin attachment site] Note the domain around the biotin-binding lysine residue is evolutionary related to that around 
55 the lipoyl-binding lysine residue of 2-oxo acid dehydrogenase acyltranslerasos 

[ 1] Knowles J.R. Annu. Rev. Biochem. 58:195-221(1989). 

[2] Samols D., Thronton C.G., Murtif V.L, Kumar G.K., Haase F.C., Wood H.G. J. Biol. Chem. 263:6461-6464 
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(1988). 

| 3) Goss N.H., Wood KG. Moth. Enzymol. 107:261-270(1984). 

| a\ tili»iiuy |:ve tl Xi« Y : , PrtiK V,U„ Kumar cu< ;( B«nu«" K. W«t?tl H/CV. GmiiSlA P, J, Biol, Oh em, 267:10407-10412 
(1992). , 

5 . . ■ ' - 

[0271] 2-oxo acid dehydrogenases acyltransferase component lipoyl binding site 

The 2-oxo acid dehydrogenase multienzyme complexes [1 ,2] Irom bacterial and eukaryotic sources catalyze the oxi- 
dative decarboxylation of 2-oxo acids to the corresponding acyl-CoA. The three members of this family of multienzyme 
comploxoQ nro: 

w 

Pyruvate dehydrogenase complex (PDC). 
2-oxoglutarato dehydrogenase complex (OGDC). 
Branched-chain 2-oxo acid dehydrogenase complex (BCOADC). 

is These three complexes share a common architecture: they are composed of multiple copies of three component en- 
zymes-El, E2andE3. E1 is a thiamine pyrophosphate-dependent 2-oxo acid dehydrogenase, E2 a dihydrolipamide 
acyltransferase, and E3 an FAD-containing dihydrolipamide dehydrogenase. 

[0272] E2 acyltmnclorasos havo an OGGontial colaclor, lipoic acid, which is covafcntly bound via a amide linkage to 
a lysine.group. The E2 components of OGCD and BCOACD bind a single lipoyl group, while those of PDC bind either 
20 one (in yeast and in Bacillus), two {in mammals), or three (in Azotobacter and in Escherichia coli) lipoyl groups [3]. 

In addition to the E2 components of the three enzymatic complexes described above, a lipoic acid cofactor is also 
found in the following proteins: 

H-protein ol the glycine cleavage system (GCS) [4], GCS is a multienzyme complex ol four protein components, 
25 which catalyzes the degradation of glycine. H protein shuttles the methylamine group of glycine from the P protein 

to the T protein. H-protein from either prokaryotes or eukaryotes binds a single lipoic group. 

Mammalian and yeast pyruvate dehydrogenase complexes differ from that of other sources, in that they contain, 

in small amounts, a protein of unknown function - designated protein X or component X. Its sequence is closely 

related to that of E2 subunits and seems to bind a lipoic group [5]. 
30 . Fast migrating protein (FMP) (gene acoC) from Alcaiigenes eutrophus [6]. 

This protein is most probably a dihydrolipamide acyltransferase involved in acetoin metabolism. 

A signature pattern was developed which allows the detection of the lipoyl-binding site. 

[0273] Consensus pattern[GN]-x(2)-[LIVF]-x(5)-[LIVFC]-x(2)-|LIVFA]-x(3)-K-[STAI\0-[STAVQDN]-x(2HLIVMFS]-x 
35 (5)-[GCN]-x-[LIVMFY] |K is the lipoyl-binding site] Note the domain around the lipoyl-binding lysine residue is evolu- 
tionary related to that around the biotin-binding lysine residue of biotin requiring enzymes 

[ 1] Yoaman S.J. Biochem. J. 257:625-632(1989). 
[ 2] Yeaman S.J. Trends Biochem. Sci. 11:293-296(1986). 
40 i 3] Russel G.C., Guest J.R. Biochim. Biophys. Acta 1076:225-232(1991). 

[ 4] Fujiwara K., Okamura-lkeda K., Motokawa Y. J. Biol. Chem. 261:8836-8841(1986). 

1 5] Behal R.H., Browning K.S., Hall T.B., Reed L.J. Proc. Natl. Acad. Sci. U.S.A. 86:8732-8736(1989). 

[ G] Pfiolort H, Hoin S., Kruogor N. ( Zoh K., Schmidt B. ( Stoinbuochol A. J. Bacteriol. 173:4056-4071(1991). 

45 [0274] 70. C2 (C2 domain) Number of members: 295 

Some isozymes of protein kinase C (PKC) [1,2] contain a domain, known as C2, of about 116 amino-acid residues 
which is located between the two copies of the C1 domain (that bind phorbol esters and diacylglycerol) (see 
<PDOC00379>) and the protein kinase catalytic domain (see <PDOC00100>). Regions with significant homology 
[3.E1] to the C2-domain have been found in the following proteins: 

50 

PKC isotorms alpha, beta and gamma and Drosophila isoforms PKC1 and PKC2. 

PKC isoforms delta, epsilon and eta, Caenorhabditis elegans kin-13 and yeast PKC1 have a C2-like domain at 
- the N-terminal extremity [4]. 

Yeast cAMP dependent protein kinase SCH9 contains a C2-like domain. 
55 - Mammalian phosphatidylinositol-specific phospholipase C (PI-PLC) (see <PDOC50007>) isoforms beta, gamma 
and delta as well as several non-mammalian PI-PLCs have a C2-like domain C-lerminal of the catalytic domain. 
Mammalian and plants phosphatidylinositol-3-kinase have a C2-like domain in the central region of the 110 Kd 
catalytic subunit. 
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Yeast phosphalidylserine-decarboxylase 2 (gene PSD2) contains a C2 domain in its central region. 
Cytosolic phospholipase D from plants and cytosdlic phospholipase A2 have a C2-like domain at their N-terminus. 
Synaptotagmins (p65). This is a family of related synaptic vesicle proteins that bind acidic phospholipids and that 
may have a regulatory role in the membrane interactions during trafficking of synaptic vesicles at the active zone 
5 of the synapse. All isoforms of synaptotagmins have two copies of the C2 domain in their C-terminal region. 

Rabphilin-3A, a synaptic protein contains two C2 domains. 

Caenorhabditis elegans protein unc-13 whose function is not known. Unc-13 has a C2 domain in its central part 
and a C2-like domain at the C-terminus. 

rasGAP and the breakpoint cluster protein bcr have a C2-domain C-terminal of a PH-domain. 
10 - Yeast protein BUD2 (or CLA2) has a C2-domain in the central region. 

Yeast protein RSP5 and human protein NEDD-4, both proteins also contain WW domains (see <PDOC50020>). 

- Perforin (see <PDOC00251>) has a C2 domain at the C-terminus. It is the only extracellular protein known to 
contain a C2 domain. 

Yeast hypothetical protein YML072C has a C2 domain. 
is - Yeast hypothetical protein YNL087W has three C2 domains. 

Caenorhabditis elegans hypothetical protein F37 A4.7 has two C2 domains. 

The C2 domain is thought to be involved in calcium-dependent phospholipid binding [5]. Since domains related to 
the C2 domain are also found in proteins that do not bind calcium, other putative functions for the C2 domain like 
e.g. binding to inositol-1 ,3,4,5-tetraphosphate have been suggested [6]. Recently, the 3D structure of the first C2 
20 domain of synaptotagmin has been reported [7], the domain forms an eight-stranded beta sandwich. The signature 

pattern that has been developed for the C2 domain is located in a conserved part of that domain, the connecting 
loop between beta strands 2 and 3. A profile has been developed for the C2 domain that covers the total domain. 

- Consensus pattern: [ACG]-x(2)-L-x(2,3)-D-x(1 ,2)-[NGSTLIF]-[GTMR]-x-[STAP]-D-[PA]-[FY] 

25 - Note: this documentation entry is linked to both a signature pattern and a profile. As the profile is much more 
sensitive than the pattern, you should use it if you have access to the necessary software tools to do so. 

[0275] (1]Medline: 96367095 Extending the C2 domain family: C2s in PKCs delta, epsilon, eta and theta, phosphol- 
ipases, GAPs and perforin. Ponting CP, Parker PJ; Protein Sci 1996;5:162-166. 

30 

I 1] Azzi A., Boscoboinik D. ( Hensey C. Eur. J. Biochem. 208:547-557(1992). 
( 2] Stabel S. Semin. Cancer Biol. 5:277-284(1994). 

[ 3] Brose N. ( Hofmann K.O., Hata Y., Suedhof T.C. J. Biol. Chem. 270:25273-25280(1995). 
| 4] Sossin W.S., Schwartz J.H. Trends Biochem. Sci. 18:207-208(1993). 
35 [ 5] Davletov B.A., Suedhof T.C. J. Biol. Chem. 268:26386-26390(1 993). 

[ 6] Fukuda M., Aruga J., Niinobe M., AimotoS., Mikoshiba K. J. Biol. Chem. 269:29206-29211(1994). 
[ 6] Sutton R.B., Davletov B.A., Berghuis A.M., Suedhof T.C, Sprang S.R. Coll 80:929-938(1995). 

[0276] 71. CAP (CAP protein) Number of members: 11 
40 in budding and fission yeasts the CAP protein is a bifunctional protein whose N-terminal domain binds to adenylyl 
cyclase, thereby enabling that enzyme to be activated by upstream regulatory signals, such as Ras. The function of 
the C-terminal domain is less clear,, but it is required for normal cellular morphology and growth control [1]. CAP is 
conserved In higher eukaryotic organisms whoro Its function Is not yot clear [2]. 

Structurally, CAP Is a protein of 474 lo 551 roelduoa which conelal of Iwo dumnlne tjopumlod by h prollno-rlch hlrifjo, 
4S Two signature patterns, one corresponding to a conserved region in the N-terminal extremity and the other to a C- 
terminal region have been developed. 

- Consensus pattern: [LIVM](2)-x-R-L-[DE]-x(4)-R-L-E 

- Consensus pattern: D-[L!VMFY]-x-E-x-[PA]-x-P-E-Q-[LIVMFY]-K 

so 

[ 1] Kawamukai M., Gerst J., Field J., Riggs M., Rodgers L, Wigler M., Young D. Mol. Biol. Cell 3:167-180(1992). 
[2] Yu G. ( Swiston J., Young D. J. Cell Sci. 107:1671-1678(1994). 

[0277] 72. CAPJ3LY (CAP-Gly domain) 
55 CAP stands for cytoskeleton-associatod proteins. Swiss:P39937 may bo a mombor but has not boon Includod. 11 has 
a weak match to the family between residues 22-67. Number of members: 24 I 
[1]Medline: 93242656. Sequence homologies between fourcytoskelolon-as6ociatod proteins. Riohemann K, Sorg C; 
Trends Biochem Sci 1993;18:82-83. 


54 


EP 1 033 405 A2 


[0278] II has boon 6hown |1] that some cytoskeleton-associated proteins (CAP) share the presence of a conserved, 
glycine-rich domain of about 42 residues, called here CAP-Gly. Proteins known to contain this domain are listed below. 

- Resti'n (also known as cytoplasmic linker protein-170 or CLIP- 170), a 160 Kd protein associated with intermedial 
5 filaments and that links endocytic vesicles to microtubules. Resttn contains two copies of the CAP-Gly domain. 

- . Vertebrate dynactin (150 Kd dynetn-associated polypeptide; DAP) and Drosophila glued, a major component of 

activator I, a 20S polypeptide complex that stimulates dynein-mediated vesicle transport. 

- Yoasl protein BIK1 which seems to be required for the formation or stabilization of microtubules during mitosis and 
lot rtpindlr) poln body luoion dminn r.onjugHlion. 

10 - Yeast protein NIP100 (NIP80). 

- Human protein CKAP1/TFCB, Schizosaccharomyces pombe protein alpll and Caenorhabditis elegans hypothet- 
ical protein F53F4.3. These proteins contain a N-terminal ubiquitin domain (see <PDOC00271>) and a C-terminal 
CAP-Gly domain. 

Crionorhfibditis ologans hypothetical protein M01 A8.2. 
/£ - Yoasl hypothetical protein YNL148c. 

Structurally, these proteins are made of three distinct parts: an N-terminal section that is most probably globular and 
contains the CAP-Gly domain, a large central region predicted to be in an alpha-helical coiled-coil conformation and, 
finally, a short C-terminal globular domain. The signature for the CAP-Gly domain corresponds to the first 32 residues 
20 of the domain and includes five of the six conserved glycines. 

- Consensus pattern: G-x(8,10)-[FYW]-x-G-[LIVM]-x-[LIVMFY].x(4)-G-K-[NH]-x-G-[STAR]-x(2)-G-x(2)-[LY]-F 

[ 1] Riehemann K., Sorg C. Trends Biochem. Sci. 18:82-83(1993). 
25 [0279] 73. (CBD1) 

Cellulose-binding domain, fungal type 

The microbial degradation of cellulose and xylans requires several types of enzymes such as endoglucanases (EC 
3.2.1.4), celtobiohydrolases (EC 3.2.1.91) (exoglucanases), orxylanases (EC 3.2.1.8) [1]. 

[0280] Structurally, cetlulases and xylanases generally consist of a catalytic domain joined to a cellulose-binding 
30 domain (CBD) by a short linker sequence rich in proline and/or hydroxy-amino acids. 

[0281] The. CBD of a number of fungal cellulases has been shown to consist of 36 amino acid residues. Enzymes 
known to contain such a domain are: 

Endoglucanase I (gene egll ) from Trichoderma reesei. 
35 - Endoglucanase II (gene egl2) from Trichoderma reesei. 

Endoglucanase V (gene eg!5) from Trichoderma reesei. 

- Exocellobiohydrolase I (gene CBHI) from Humicola grisea, Neurospora crassa, Phanerochaete chrysosporiurr 
Trichoderma ree<ei, and Trichoderma viride. 

Exocellobiohydrolase II (gene CBHII) from Trichoderma reesei. 
40 - Exocellobiohydrolase 3 (gene cel3) from Agartcus bisporus 
Endoglucanases B, C2, F and K 1rom Fusarium oxysporum. 

[0282] The CBD domain is found either at the N-terminal (Cbh-ll or eg!2) or at the C-terminal extremity (Cbh-I, egll 
or egl 5) of these enzymes. As it is shown in the following schematic representation, there are four conserved cysteines 
45 in this type of CBD domain, all involved in disulfide bonds. 

+ — + 

I I I I 

xxxxxxxCxxxxxxxxxxCxxxxxCxxxxxxxxxCx 

* * ** * ** ** ****************** * 


so 
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'C: conserved cysteine involved in a disulfide bond, 
position of the pattern. 

[0283] Such a domain has also been found in a putative polysaccharide binding protein from th red alga, Porphyra 
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purpurea [2]. Structurally, this protein consists of tour tandem repeats of the CBD domain. 

[0284] Consensus pattemC-G-G-x(4 l 7)-G-x(3)-C-x(5)-C-x(3,5)-[NHG]-x-[FYWM]- x(2)-Q-C [The four C's are in- 
volved in disulfide bonds] Sequences known to belong to this class detected by the pattern ALL. ■ 

5 [1] Gilkes N.R., Henrissat B., Kilburn D.G., Miller R.C. Jr., Warren R'.A.J. Microbiol. Rev. 55:303-315(1991). 

[ 2] Liu Q., der Meer J.P, Reith M.E. 

[0285] 74. CBS domain. 3D Structure found as a subdomaih in TIM.barrol of inosine-. CBS domain wob page. CBS 
domains are small intracellular modules mostly found in 2 or lour copies within a protein. CBS domains are 1ound in 
10 cystathionine-beta-synthase (CBS) where mutations lead to homocystinuria. Two CBS domains are found in inosine- 
monophosphate dehydrogenase from all species, however the CBS domains are not needed for activity. Two CBS 
domains are found in intracellular loops of several chloride channels. Mutations in this domain of Swiss:P35520 lead 
to homocystinuria. Number of members: 414 

is [iJMedline: 97172695 The structure of a domain common to archaebacteria and the homocystinuria disease pro- 

tein. Bateman A; Trends Biochem Sci 1997;22:12-13. 

[2]Medline: 96279836 Structure and mechanism of inosine monophosphate dehydrogenase in complex with the 
immunosuppressant mycophenolic-acid. Sintchak MD, Fleming MA, Futer O, Raybuck SA, Chambers SR Caron 
PR, Murcko MA, Wilson KP; Cell 1996;85:921-930. 
20 Discovery of. CBS domain, 

[3]Medline: 97259972 CBS domains in CIC chloride channels implicated in myotonia and nephrolithiasis (kidney 
stones). Ponting CP; J Mol Med 1997;75:160-163. 

[0286] 75. CDP-OH_P_transf (CDP-alcohol phosphatidyltransferase) 
25 All of these members have the ability to catalyze the displacement of CMP from a CDP-alcohol by a second alcohol 
with formation of a phosphodiester bond and concomitant breaking of a phosphoride anhydride bond. Number of mem- 
bers: 32 

A number of phosphatidyltransferases, which are all involved in phospholipid biosynthesis and that share the property 
of catalyzing the displacement of CMP from a CDP-alcohol by a second alcohol with formation of a phosphodiester 
30 bond and concomitant breaking of a phosphoride anhydride bond share a conserved sequence region [1,2]. These 
enzymes are: 

Ethanolaminephosphotransf erase (EC 2.7.8.1) from yeast (gene EPT1). 
Diacylglycerol cholinephosphotransferase (EC 2.7.8.2) from yeast (gene CPT1). 
35 - Phosphatidylglycerophosphate synthase (EC 2.7.8.5) (CDP-diacylglycerol-glycerol-3-phosphate 3-phosphatidyl- 
transferase) Irom bacteria (gene pgsA). 

Phosphatidylserine synthase (EC 2.7.8.8) (CDP-diacylglycerol-sorine O-phosphatidyltransterase) from yeast 
(gene CHOI) and from Bacillus subtilis (gene pssA). 

Phosphatidylinositol synthase (EC 2.7.8.11) (CDP-diacylglycerol-inositol 3-phosphatidyltransf erase) from yeast 
40 (gene PIS). 

These enzymes are proteins of from 200 to 400 amino acid residues. The conserved region contains three aspartic 
acid residues and is located in the N-terminal section of the sequences. 

45 - Consensus pattern: D-G-x(2)-A-R-x(8)-G-x(3)-D-x(3)-D 

[1]Medline: 97075020 Two-dimensional 1H-NMR of transmembrane peptides from Escherichia coli phosphatidylglyc- 
erophosphate synthase in micelles. Morein S, Trouard TP, Hauksson JB, Rilfors L, Arvidson G, Lindblom G; Eur J 
Biochem 1996;241:489-497. 

so 

I 1] Nikawa J.-L, Kodaki T, Yamashita S. 
J. Biol. Chem. 262:4876-4881(1987). 
[ 2] Hjelmstad R.H., Bell R.M. 
J. Biol. Chem. 266:5094-5134(1991). 

55 

[0287] 76. CHOD (Cholesterol oxidase) Members of the GMC oxidoreductase family. Number of members: 3 [ 
[0288] [IJMedline: 94032271. Crystal structure of cholesterol oxidase complexed with a steroid substrate: implica- 
tions for flavin adenine dinucleotide dependent alcohol oxidases. Li J, Vrielink A, Brick P, Blow DM; Biochemistry 1993; 
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[ 2] Reed J.C., Bidwai A.P., Glover C.V.C. J. Biol. Chem. 269:18192-18200(1994). 
[0293] 79. CLP„protease (Clp protease) 

These proteins belong to family S14 in the classification of peptidases. 

5 

!- The Clp protease has an active site catalytic triad. In E. coli Clp protease, ser-111, his-136 and asp-185 form 
the catalytic triad. 

!- Swiss:P48254 has lost all of these active site residues and is therefore inactive. 

!- Swiss:P42379 contains two large insertions, Swiss:P42380 contains one large insertion. Number of members: 38 

w 

[0294] The endopeptidase Clp (EC 3.4.21 .92) from Escherichia coli cleaves peptides in various proteins in a process 
that requires ATP hydrolysis [1 ,2]. Clp is a dimeric protein which consists of a proteolytic subunit (gene cIpP) and either 
of two related ATP-binding regulatory subunits (genes clpA and cIpX). CIpP is a serine protease which has a chymo- 
trypsin-like activity. Its catalytic activity seems to be provided by a charge relay system similar to that of the trypsin 
*5 family of serine proteases, but which evolved by independent convergent evolution. Proteases highly similar to CIpP 
have been found to be encoded in the genome of the chloroplast of plants and seem to be also present in. other 
eukaryotes. The sequences around two of the residues involved in the catalytic triad (a serine and a histidine) are 
highly conserved and can be used as signature patterns specific to that category of proteases. 

20 . Consensus pattern: T-x(2)-[LIVMF]-G-x-A-[SAC]-S-lMSA]-[PAG]-[STA] [S is the active site residue] 

- Consensus pattern: R-x(3)-[EAP]-x(3HLIVMFYT]-M-[LIVM]-H-Q-P [H is the active site residue] 

[1]Medline: 98050920. The structure of CIpP at 2.3 angstroms resolution suggests a model for ATP-dependent 
proteolysis. Wang J, Hartling JA, Flanagan JM; Cell 1997;91:447-456. 
25 [ 1] Maurizi M.R., Clark W.P., Kim S.-K, Gottesman S. J. Biol. Chem. 265:12546-12552(1990). 

[ 2] Gottesman S., Maurizi M.R. Microbiol. Rev. 56:592-621(1992). 
[ 3] Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:19-61(1994). 

[0295] 80. CNG_membrane (Transmembrane region cyclic Nucleotide Gated Channel) 
30 [1 JMedline: 94224763. Cyclic nucleotide-gated channels: an expanding new family of ion channels. Yau KW; Proc Natl 
Acad Sci USA 1994;91:3481-3483. 

This family is found to the N-terminus of the cNMP_binding. Number of members: 56. Proteins that bind cyclic nucle- 
otides (cAMP or cGMP) share a structural domain of about 120 residues [1-3]. The best studied of these proteins is 
the prokaryotic catabolite gene activator (also known as the cAMP receptor protein) (gene crp) where such a domain 
35 is. known to be composed of three alpha-helices and a distinctive eight-stranded, antiparallel beta-barrel structure. 
Such a domain is known to exist in the following proteins: 

Prokaryotic catabolite gene activator protein (CAP). 

cAMP- and cGMP-dependent protein kinases (cAPK and cGPK). Both types of kinases contains two tandem copies 
40 of the cyclic nucleotide-binding domain. The cAPK's are composed of two different subunits: a catalytic chain and 

a regulatory chain which contains both copies of the domain. The cGPK's are single chain enzymes that include 
the two copies of the domain in their N-terminal section. The nucleotide specificity of cAPK and cGPK is due to 
an amino acid In tho consorvod region ol botfi*bnrrol 7: ti Ihroonlno thnt Ib lnvnrlnnt In cGPK lo nn filiinlno In moot 
cAPK. 

45 . Vertebrate cyclic nucleotide-gated ion-channels. Two such cations channels have been fully characterized. One 
is found in rod cells where it plays a role in visual signal transduction. It specifically binds to cGMP loading to an 
opening of the channel and thereby causing a depolarization of rod photoreceptors. In olfactory epithelium a similar, 
cAMP-binding, channel plays a role in odorant signal transduction. There are six invariant amino acids in this 
domain, three of which are glycine residues that are thought to bo ossontial for maintenance- of tho structural 

SO integrity of the beta-barrel. Two signature patterns have been developed for this domain. The first pattern is located 

within beta-barrels and 3 and contains the first two conserved Gly. The second pattern is located within beta- 
barrels 6 and 7 and contains the third conserved Gly as well as the three other invariant residues. 

- Consensus pattern: [LIVM]-[VIC]-x(2)-G-IDENQTA]-x-[GAC]-x(2)-ILIVMFY](4)-x(2)-G 

ss - Consensus pattern; [LIVMF]-G-E-x-|GAS]-|LIVM]-x(5,11)-R-|STAQ]-A-x-[LIVMA]-x-|STACV| 

( 

[ 1] Weber I.T,. Shabb J.B., Corbin J.D. Biochemistry 28:6122-6127(1989). 
[ 2] Kaupp U.B. Trends Neurosci. 14:150-157(1991). 
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[ 3] Shabb J.B., Corbin J.D. J. Biol. Chem. 267:5723-5726(1992). 

[0296] 6l . COX10_ctaB_cyoE (Cytochrome c oxidase assembly factor) 
[1] 

s Medline: 95191390 

Biosynthesis and functional role of haem O and haem A 
Mogi T, Saiki K, Anraku Y; 

Mol Microbiol 1994;14:391-398. 
Cytochrome c oxidase is a multi subunit enzyme. The complexity of this enzyme requires assistance in building the 
to complox. 

This is carried out by the Cytochrome c oxidase assembly factor. 
Number of members: 31 

[0297] Cytochrome c oxidase is an oligomeric enzymatic complex which seems to require the aid of a number of 
proteins that either act as chaperonins to help the subunits of the enzyme to fold correctly, or assist in the assembly 
is of the metal centers [1]. One of these subunits is known as COX10 in yeast and as ctaB [2] in aerobic prokaryotes. It 
io evolutionary roltitod to cyoE protoin from tho Escherichia coli cytochrome O terminal oxidase complex. 
[0298] These proteins probably contain [3] seven transmembrane segments. The most conserved region is located 
in a loop between the second and third of these segments and has been selected as a signature pattern. 

20 - Consensus pattern: [ED]-x-D-x(2)-M-x-R-T-x(2)-R-x(4)-G 

[ 1] Nobroga M.P., Nobroga F.G., Tzagoloff A. 

J. Biol. Chem. 265:14220-14226(1990). 
[ 2] Cao J., Hosier J. ( Shapleigh J., Revzin A., Ferguson-Miller S. 
25 j. Biol. Chem. 267:24273-24278(1992). 

[ 3] Chepuri V., Gennis R.B. 

J. Biol. Chem. 265:12978-12986(1990). 

[0299] 82. COX3 (Cytochrome c oxidase subunit III) 
oo This family corresponds to chains c and p. 

m 

Medline: 96216288 
The whole structure of the 13-subunit oxidized cytochrome c 
oxidase at 2.8 A. 

35 Tsukihara T, Aoyama H, Yamashita E, Tomizaki T, Yamaguchi H, 
Shinzawa-ltoh K, Nakashima R, Yaono R, Yoshikawa S; 
Science 1 996,272: 1 1 36-1 1 44. 
Number ol members: 224 

[0300] 83. COX5B (Cytochrome c oxidase subunit Vb) 
40 [1] 

Medline: 96216288 

The whole structure of the 13-subunit oxidized cytochrome c oxidase at 2.8 A. 

Tsukihara T, Aoyama H, Yamashita E, Tomizaki T, Yamaguchi H, Shinzawa-ltoh K, Nakashima R, Yaono R, Yoshikawa 

S; 

45 Science 1996;272:1136-1144. 

This family consists of chains F and S 
Number of members: 10 

[0301] Cytochrome c oxidase (EC 1.9.3.1) [1] is an oligomeric enzymatic complex which is a component of the 
respiratory chain complex and is involved in the transfer of electrons from cytochrome c to oxygen. In eukaryotes this 
onzymo complex is located in tho mitochondrial inner membrane; in aerobic prokaryotes it is found in the plasma 
membrane. In addition to the three large subunits that form the catalytic center of the enzyme complex there are, in 
eukaryotes, a variable number of small polypeptidic subunits. One of these subunits which is known as Vb in mammals, 
Vin slime mold and IV in yeast, binds a zinc atom. The sequence of subunit Vb is well conserved and includes three 
conserved cysteines that are thought to coordinate the zinc ion [2]. Two of these cysteines are clustered in the C- 
55 terminal section of the subunit; this region has been selected as a signature pattern. 

. - Consensus pattern: [LIVM](2)-jFYW]-x(10)-C-x(2)-C-G-x(2)-(FY]-K-L [The two C's probably bind zinc] 
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[ 1] Capaldi R. A., Malatesta F., Darley-Usmar V.M. Biochim. Biophys. Acta 726:135-148(1983). 

[ 2] Rizzuto R.. Sandona D., Brini M., Capaldi R.A., Bisson R. Biochim. Biophys. Acta 1129:100-104(1991). 

[0302] 84. COesterase (Carboxylesterases) 
5 Cholinesterase pages 

The prints entry is specific to acetylcholinesterase 
Number of members: 273 

[0303] Higher eukaryotes have many distinct esterases. Among the difforont typos are thoso which act on carboxylic 
esters (EC 3.1.1.-). Carboxyl-esterases have been classified into three categories (A, B and C) on the basis of difler- 
io ential patterns of inhibition by organophosphates. The sequence of a number of type-B carboxylesterases indicates 
[1,2,3] that the majority are evolutionary related. This family currently consists of the following proteins: 

Acetylcholinesterase (EC 3.1.1.7) (AChE) [E1] from vertebrates and from Drosophila. 

Mammalian cholinostoraso II (butyryl cholinostoraso) (EC 3. 1 . 1 .8,). Acotylcholinostoraeo and cholinostornso II firo 
is closely related enzymes that hydrolyze choline esters (4]. 

Mammalian liver microsomal carboxylesterases (EC 3.1.1.1). 

Drosophila esterase 6, produced in the anterior ejaculatory duct of the. male insect reproductive system where it 
plays an important role in its reproductive biology 
Drosophila. esterase P. 
20 . Culex pipiens (mosquito) esterases B1 and B2. 

Myzus persicae (peach-potato aphid) esterases E4 and FE4. 

Mammalian bile-salt-activated lipase (BAL) [5], a multifunctional lipase which catalyzes fat and vitamin absorptjon. 
It is activated by bile salts in infant intestine where it helps to digest milk fats. 
Insect juvenile hormone esterase (JH esterase) (EC 3.1.1.59). 
25 - Lipases (EC 3.1.1.3) from the fungi Geotrichum candidum and Candida rugosa. 
Caenorhabditis gut esterase (gene ges-1). 

Duck fatty acyl-CoA hydrolase, medium chain (EC 3.1 .2.14), an enzyme that may be associated with peroxisome 
proliferation and may play a role in the production of 3-hydroxy fatty acid dioster phoromonos. 
Membrane enclosed crystal proteins from slime mold. These proteins are, most probably esterases; the vesicles 
30 where they are found have therefore been termed esterosomes. 

[0304] So far two bacterial proteins have been found to belong to this family: 


Phenmedipham hydrolase (phenylcarbamate hydrolase), an Arthrobacter oxidans plasmid-encoded enzyme (gene 
35 pcd) that degrades the phenylcarbamate herbicides phenmedipham and desmedipham by hydrolyzing their central 

carbamate linkages. 

Para-nitrobenzyl esterase from Bacillus subtilis (gene pnbA). 

[0305] The following proteins, while having lost their catalytic activity, contain a domain evolutionary related to that 
40 of carboxylesterases type-B: 


Thyroglobulin (TG), a glycoprotein specific to the thyroid gland, which is the precursor of the iodinated thyroid 
hormones thyroxine (T4) and triiodo thyronine (T3). 

- Drosophila protein neuractin (gene nrt) which may mediate or modulate cell adhesion between embryonic cells 
45 during development. 

Drosophila protein glutactin (gone gll), whoso function is not known. 

[0306] As is the case for lipases and serine proteases, the catalytic apparatus of esterases involves three residues 
(catalytic triad): a serine, a glutamate or aspartate and a histidine. The sequence around the active sito serine is well 
so conserved and can be used as a signature pattern. A conserved region located in the N-terminal section containing a 
cysteine involved in a disulfide bond has been selected as a second signature pattern. 

Consensus pattern: F-[GR]-G-x(4)-[LIVM]-x-[LIV]-x-G-x-S-[STAG]-G[S is the active site residue] 

- Consensus pattern: [ED]-D-C-L-|YT]-lLIV]-|DNS]-ILIV]-[LIVFYW]-x-|POR] [C is Involvod In a disulfldo bond] 

55 

[ 1] Myers M., Richmond R.C., Oakeshott J.G. Mol. Biol. Evol. 5:113-119(1988). . I 
[ 2] Krejci E. t Duval N., Chatonnet A., Vincens P., Massoulie J. Proc. Natl. Acad. ScL U.S.A. 88:6647-6651(1991). 
[ 3] Cygler M., Schrag J.D., Sussman J.L., Harel M., Silman I. Gentry M.K., Doctor B.P. Protein Sci. 2:366-382 
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[1] 

Medline: 90285162 

Mammalian carbamyl phosphate synthetase (CPS). DNA sequence and evolution of the CPS domain. of the Syrian 
hamster multifunctional protein CAD. 
s Simmer JP, Kelly RE, Rinker AG Jr, Scully JL, Evans DR; , , ■ 

Biol Chem 1990;265:10395-10402. 
The carbamoyl-phosphate synthase domain is in the amino terminus of protein. 
Carbamoyl-phosphate synthase catalyzes the ATP-dependent synthesis of carbamyl-phosphate from glutamine or 
ammonia and bicarbonate. This important enzyme initiates both the urea cycle and the biosynthesis ol arginine and/ 
io or pyrimidines [1]. 

The carbamoyl-phosphate synthase (CPS) enzyme in prokaryotes is a heterodimer of a small and large chain. The 
small chain promotes the hydrolysis of glutamine to ammonia, which is used by the large chain to synthesize carbamoyl 
phosphate. See CPSase_L_chain. 

The small chain has a GATase domain in the carboxyl terminus. 
is See GATase. 

Number of members: 46 

[0315] Carbamoyl-phosphate synthase (CPSase) catalyzes the ATP-dependent synthesis of carbamyl-phosphate 
from glutamine (EC 6.3.5.5) or ammonia (EC 6.3.4.16) and bicarbonate [1]. This important enzyme initiates both the 
urea cycle and the biosynthesis of arginine and pyrimidines. 

20 [0316] Glutamine-dependent CPSase (CPSase II) is involved in the biosynthesis of pyrimidines and purines. In bac- 
teria such as Escherichia coli, a single enzyme is involved in both biosynthetic pathways while other bacteria have 
separate enzymes. The bacterial enzymes are formed of two subunits. A small chain (gene carA) that provides 
glutamine amidotransferase activity (GATase) necessary tor removal of the ammonia group from glutamine, and a 
large chain (gene carB) that provides CPSase activity. Such a structure is also present in fungi tor arginine biosynthesis 

25 (genes CPA1 and CPA2). In most eukaryotes, the first three steps of pyrimidine biosynthesis are catalyzed by a large 
multifunctional enzyme - called URA2 in yeast, rudimentary in Drosophila and CAD in mammals [2]. The CPSase 
domain is located between an N-terminal GATase domain and the C-terminal part which encompass the dihydroorotase 
and aspartate transcarbamylase activities. 

[0317] Ammonia-dependent CPSase (CPSase I) is involved in the urea cycle in ureolytic vertebrates; it is a mono- 

30 functional protein located in the mitochondrial matrix. 

[0318] The CPSase domain is typically 120 Kd in sizo and has arison Irom tho duplication ol an ancostral subdomain 
of about 500 amino acids. Each subdomain independently binds to ATP and it is suggested that the two homologous 
halves act separately, one to catalyze the phosphorylation of bicarbonate to carboxy phosphate and the other that of 
carbamate to carbamyl phosphate. 

35 [0319] The CPSase subdomain is also present in a single copy in the biotin-dependent enzymes acetyl-CoA car- 
boxylase (EC 6.4.1.2) (ACC), propionyl-CoA carboxylase (EC 6.4.1 .3) (PCCase), pyruvate carboxylase (EC 6.4.1.1) 
(PC) and urea carboxylase (EC 6.3.4.6). 

[0320] Two conserved regions which are probably important for binding ATP and/or catalytic activity have been se- 
lected as signatures for the subdomain. . 

40 

- Consensus pattern: [FYV]-[PS]- [LIVMC]-[LIVMA]-[LIVM]-[KR]-[PSA]-[STA]-x(3)-[SG]-G-x-[AG] 

- Consensus pattern: [LIVMF]-(LIMN]-E-[LIVMCA]-N-[PATLIVM]-[KR]-[LIVMSTAC] 

[ 1] Simmer J.P, Kelly R.E., Rinker A.G. Jr., Scully J.L, Evans D.R. J. Biol. Chem. 265:10395-10402(1990). 
45 [ 2] Davidson J.N. t Chen K.C., Jamison R.S., Musmanno LA., Kern C.B. BioEssays 15:157-164(1993). 

[0321] 87. CARL_TRIO (CRALTRIO domain) 
[1] 

Medline: 98121119 

50 Crystal structure of the Saccharomyces cerevisiae phosphatidylinositol-transfer protein. 
Sha B, Phillips SE, Bankaitis VA, Luo M; 
Nature 1998;391:506-510. 

The original profile has been extended to include the carboxyl domain from the known structure of Sec14. Swiss: 
. P10911 has not been included in the Pfam family because it does not appear to contain a complete structural domain. 
55 Number of members: 39 

[0322] 88. CSD ('Cold-shock'DNA-binding domain) I 
[1] 

Medline: 94255482 
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Crystal structure of CspA, the major cold shock protein of Escherichia coli. 
Schtndolin H, Jiang W, Inouyo M, Holnemann U; , . 

Pidci (Nnll Anncl fT»fJl USA 10fM;P1; 511 9-5123, 
Number of members: 121 

5 [0323]., A conserved domain of about 70 amino acids has been found- in prokaryotic and eukaryotic DNA-binding 
pcoteins [1 ,2,3,E1 ]. This domain, which is known as the 'cold-shock domain'(CSD) is present in the proteins listed below. 

Escherichia coli protein CS7.4 (gene cspA) which is induced in response to low temperature (cold-shock protein) 
find which binds lo unci stimulates tho transcription ol the CCAAT-containing promoters of the HN-S protein and 
io of gyrA. 

Mammalian Y box binding protein 1 (YB1 ). A protein that binds to the CCAAT-containing Y box of mammalian HLA 
class II gonos. 

- Xenopus Y box binding proteins -1 and -2 (Y1 and Y2). Proteins that bind to the CCAAT-containing Y box of 
Xenopus hsp70 genes. 

is - Xenopus B box binding protein (YB3). YB3 binds the B box promoter element of genes transcribed by RNA polymer- 
ase III. .. 

- Enhancer factor I subunit A (EFI-A) (dbpB). A protein that also bind to CCAAT-motif in various gene promoters. 
DbpA, a Human DNA-binding protein of unknown specificity. 

Bacillus subtilis cold-shock proteins cspB and cspC. 
20 - Streptomyces clavuligerus protein SC 7.0. 

Escherichia coli proteins cspB, cspC, cspD, cspE and cspR 

Unr, a mammalian gene encoded upstream of the N-ras gene. Unr contains nine repeats that are similar to the 
CSD domain. The function of Unr is not yet known but it could be a multivalent DNA-binding protein. 

25 [0324] As a signature pattern for the CSD domain, its most conserved region which is located in its N-lerminal section 
has boon selected. It must be noted that the 

beginning of this region is highly similar [4] to the RNP-1 RNA-binding motif. 

- Consensus pattern: [FY]-G-F-l-x(6,7)-[DER]-[LIVM]-F-x-H-x-[STKR]-x-[LIVMFY] 

30 

[ 1] Doniger J., Landsman D., Gonda M. A., Wistow G. 

Now Biol. 4:389-395(1992). 
[2] Wistow G. 

Nature 344:823-824(1990). 
35 [ 3] Jones P.G., inouye M. 

Mo!. Microbiol. 11:811-818(1994). 
[ 4] Landsman D. 

Nucloic Acids Ros. 20:2861-2864(1992). 

40 [0325] 89. CTF_NFI (CTF/NF-I family) 
Number of members: 45 

[0326] Nuclear tactor I (NF-I) or CCAAT box-binding transcription factor (CTF) [1 ,2] (also known as TGGCA-binding 
proteins) are a family of vertebrate nuclear proteins which recognize and bind, as dimers, the palindromic DNA se- 
quence 5'-TGGCANNNTGCCA-3\ CTF/NF-I binding sites are present in viral and cellular promoters and in the origin 

45 of DNA replication of Adenovirus type 2. 

[0327] Tho CTF/NF-I protoins wore first identified as nuclear factor I, a collection of proteins that activate the repli- 
cation of several Adenovirus serotypes (together with NF-II and NF-III) [3]. The family of proteins was also identified 
as the CTF transcription factors, before the NFI and CTF families were found to be identical [4]. The CTF/NF-I proteins 
are individually capable of activating transcription and DNA replication. The CTF/NF-I family name has also been 

so dubbed as NFI, NF-I or NF1. 

[0328] In a given species, there are a large number of different CTF/NF-I proteins. The multiplicity of CTF/NF-I is 
known to be generated both by alternative splicing and by the occurrence of four different genes. The known forms of 
NF-I genes have been classified as: 

55 - The CTF-like (actors subfamily (prototype form: CTF-1) [4] 
The NFI-X proteins. 
The NFI-A proteins. 
The NFI-B proteins. 
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All of the members in this family utilise acyl-CoA as the acceptor molecule. 
Number of members: 47 

[0337] ^2. ChaLstiLsynt (Chalcone and stilbene synthas s) 
Number of members: 146 

5 [0338] Chalcone synthases (CHS) (EC 2.3. 1 .74) and stilbene synthases (STS) (formerly known as resveratrol syn- 
thases) are related plant enzymes [1 J. CHS is an important enzyme in flavanoid biosynthesis and STS a key enzyme 
in'stilbene-type phyloalexin biosynthesis. Both enzymes catalyze the addition of three molecules of maionyl-CoA to a 
starter CoA ester (a typical example is 4-coumaroyl-CoA), producing either a chalcone (with CHS) or stilbene (with 
STS). 

io [0339] These enzymes are proteins ol about 390 amino-acid residues. A conserved cysteine residue, located in the 
central section of these proteins, has been shown [2] to be essential for the catalytic activity of both enzymes and 
probably represents the binding site for the 4-coumaryl-CoA group. The region around this active site residue is well 
conserved and can be used as a signature pattern. 

[0340] In addition to the plant enzymes, this family also includes Bacillus subtilis bcsA. 

15 

- Consensus pattern: R-[LIVMFYS)-x-[LI VM]-x-[OHG]-x-G-C-[FYNA]-[GA]-G-[G A]-[STAV]-x-[LIVMFHRA] [C is the 
active site residue]- 

[ 1] Schroeder J., Schroeder G. 
20 Z. Naturforsch. 45C: 1-8(1 990). 

I 2] Lanz I, Tropf S. ( Marner F.-J., Schroeder J., Schroeder G. 
J. Biol. Chem. 266:9971-9976(1991). 

[0341] 93. Chorismato.synt (Chorismate synthase) 

25 Number of members: 19 . 
[0342] Chorismate synthase (EC 4.6.1.4) catalyzes the last of the seven steps in the shikimate pathway which is 
used in prokaryotes, fungi and plants for the biosynthesis of aromatic amino acids. It catalyzes the 1 ( 4-trans elimination 
of the phosphate group from 5-enotpyruvylshikimate-3-phosphate (EPSP) to form chorismate which can then be used 
in phenylalanine, tyrosine or tryptophan biosynthesis. Chorismate synthase requires the presence of a reduced flavin 

30 mononucleotide (FMNH2 or FADH2) for its activity. 

[0343] Chorismate synthase from various sources shows [1 ,2] a high degree of sequence conservation. It is a protein 
of about 360 to 400 amino-acid residues. Three signature patterns have been developed from conserved regions rich 
in basic residues (mostly arginines). The first is in the N-terminal section, the second is central and the third is C-terminal. 

35 - Consensus pattern: G-E-S-H-[GC]-x(2)-[LIVM]-[GTV]-x-[LIVM)(2)-[DE]-G-x-[PV] 

- Consensus pattern: [GE]-R-[SA](2)-[SAG]-R-[EV].[ST]-x(2)-[RH]-V-x(2)-G 

- Consensus pattern: R-[SH]-D-[PSV]-[CSAV]-x(4)-[GAI]-x-[IVGSP]-[LIVM]-x-E-[STAH]-[LIVM] 

[ 1] Schaller A., Schmid J., Leibinger U., Amrhein N. 
40 J. Biol. Chem. 266:21 434-21 438(1 991 ). 

[ 2] Jones D.G.L., Reusser U., Braus G.H. 
Mol. Microbiol. 5:2143-2152(1991). 

[0344] 94, Clat_adaptor_s (Clathrin adaptor complex small chain) 

45 Number of members: 21 

[0345] Clathrin coated vesicles (CCV) mediate intracellular membrane traffic such as receptor mediated endocytosis. 
In addition to clathrin, the CCV are composed of a number of other components including oligomeric complexes which 
are known as adaptor or clathrin assembly proteins (AP) complexes [1 ]. The adaptor complexes are believed to interact 
with the cytoplasmic tails of membrane proteins, leading to their selection and concentration. In mammals two type of 

so adaptor complexes are known: AP-1 which is associated with the Golgi complex and AP-2 which is associated with 
the plasma membrane. Both AP-1 and AP-2 are heterotetramers that consist of two large chains - the adaptins - 
(gamma and beta" in AP-1; alpha and beta in AP-2); a medium chain (AP47 in AP-1 ; AP50 in AP-2) and a small chain 
(AP1 9 in AP-1 ; AP17 in AP-2). 

[0346] The small chains of AP-1 and AP-2 are evolutionary related proteins of about 18 Kd. Homologs of AP17 and 
ss AP19 have also been found in yeast (genes APS1/YAP19 and APS2/YAP17) [2,3,4]. AP17 and AP19 are also related 
to the zeta-chain [5] of coatomer (zeta-cop), a cytosolic protein complex that reversibly associates with Golgi mem- 
branes to form vesicles that mediate biosynthetic protein transport from the endoplasmic reticulum, via the Golgi up 
to the trans Golgi network. 
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[0347] A conserved region in the central section of these proteins has been selected as a signature pattern. 

- Consensus pattern: [LIVM](2)-Y-[KR]-x(4)-L-Y-F 

5 [ 1] Pearse B.M., Robinson M.S. 

Annu. Rev. Cell Biol. 6:151-171(1990). 
[ 2] Kirchhausen T., Davis A.C., Frucht S., O'Brine Greco B., Payne G.S., 
Tubb B. 

J. Biol. Chem. 266:11153-11157(1991). 
io [ 3] Nakai M.. Takada T.. Endo T. 

Biochim. Biophys. Acta 1174:282-284(1993). 
( 4] Phan H.L, Finlay J.A., Chu D.S., Tan P.K., Kirchhausen T. ( Payne G.S. 

EMBO J. 13:1706-1717(1994). 
[ 5] Kuge O., Hara-Kuge S., Orci L., Ravazzola M., Amherdt M., Tanigawa G., 
is Wieland F.T., Rothman J.E. 

J. Cell Biol. 123:1727-1734(1993). 

[0348] 95. Clathrin_lg_ch (Clathrin light chain.) 
Number of members: 8 

20 [0349] Clathrin [1 ,2] is the major coat-forming protein that encloses vesicles such as coated pits and forms cell 
surface patches involved in membrane traffic within eukaryotic cells. The clathrin coats (called triskelions) are com- 
posed of three heavy chains (180 Kd) and three light chains (23 to 27 Kd). 

[0350] The clathrin light chains [3], which may help to properly orient the assembly and disassembly of the clathrin 
coats, bind non-covalently to the heavy chain, they also bind calcium and interact with the hsc70 uncoating ATPase. 

25 

In higher eukaryotes two genes code for distinct but related light chains: LC(a) and LC(b). Each of tho two gonos 
can yield, by tissue-specific alternative splicing, two separate forms which differ by the insertion of a sequence of 
respectively thirty or eighteen residues. There is, in the N-terminal part of the clathrin light chains a domain of 
twenty one amino acid residues which is perfectly conserved in LC(a) and LC(b). 
30 - in yeast there is a single light chain (gene CLC1) whose sequence is only distantly related to that of higher eu- 
karyotes. 

[0351] Two signature patterns have been developed for clathrin light chains. The first pattern is a heptapeptide from 
the center of the conserved N-terminal region of eukaryotic light chains; the second pattern is derived from a positively 
35 charged region located in the C-terminal extremity of all known clathrin light chains. 

Consensus pattern; F-L-A-Q-Q-E-S 

[ 1] Keen J.H. 
to Annu. Rev. Biochem. 59:415-438(1990), 

[ 2) Brodsky F.M. 

Science 242: 1 396-1 402( 1 988). 
[ 3] Brodsky F.M., Hill B.L., Acton S,L., Naethke l„ Wong D.H., 
Ponnambalam S., Parham P. 
45 Trends Biochem. Sci. 16:208-213(1991). 

[0352] 96. (Clathrin repeat) 7-fold repeat in Clathrin and VPS 

Each repeat is about 140 amino acids long. The repeats occur in the arm region of the Clathrin heavy chain. 
Number of members: 79 
so [1] 

Medline: 92191269 

Folding and trimerization of clathrin subunits at the triskelion hub. 
Nathke IS, Heuser J, Lupas A, Stock J, Turck CW, Brodsky FM; 
Cell 1992;68:899-910. [2] 
55 Medline: 88097376 

Clathrin heavy chain: molecular cloning and complete primary structure. I 
Kirchhausen T, Harrison SC, Chow EP, Mattaliano RJ, 
Ramachandran KL, Smart J, Brosius J; 
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Proc Natl Acad Sci U S A 1987;84:8805-8809. 
[0353] 97. Collagen (Collagen Irlplo holix ropoat (20 copies)) 
[1] Medlirjie: 94059583 
Now mombors ot the collagen superlamily 
5 Mayne R, Brewton RG; 

Curr Opin Cell Biol 1 993;5:883-890. 
Scurvy is associated with collagens. 

Members of this family belong to the collagen superfamily [1 ]. 

Collagens are generally extracellular structural proteins involved in formation of connective tissue structure. 
w The alignment contains 20 copies ot the G-X-Y repeat that forms a triple helix. The first position of the repeat is glycine, 
the second and third positions can be any residue but are frequently proline and hydroxyproline. .Collagens are post 
translationally modified by proline hydoxylase to form the hydroxyproline residues. Defective hydroxylation is the cause 
of scurvy. 

Some members of the collagen superfamily are not involved in connective tissue structure but share the same triple 
if< holiuil MttictUfO. 

Numbor of mombors: 2125 

[0354] 98. Coprogen^oxidas (Coproporphyrinogen III oxidase) 
Number of members: 12 

Coproporphyrinogen III oxidase (EC 1.3.3.3) (coproporphyrinogenase) [1.2] catalyzes the oxidative decarboxylation 
20 of coproporphyrinogen III into proloporphyrinogen IX, a common step in the pathway for the biosynthesis of porphyrins 
fiuch au homo, chlorophyll or cobalamin. 

[0355] Coproporphyrinogen III oxidase is an enzyme that requires iron for its activity. A cysteine seems to be important 
for the catalytic mechanism [3]. Sequences from a variety of eukaryotic and prokaryotic sources show that this enzyme 
has been evolutionarily conserved. A highly conserved region in the central part of the sequence has been selected 
25 as a signature pattern. This region contains the only conserved cysteine and is rich in charged amino acids. 

- Concensus pallom: K-x-W-C-x(2)-|F YH](3)-[U VM)-x-H'R-x-E-x-R-G-|LIVMJ-G-G-|LIVM]-F-F-D 

[ 1]Xu K. t EllinttT. 
30 J. Bacteriol. 175:4990-4999(1993). 

[ 2] Kohno H., Furukawa T., Yoshinaga T., Tokunaga R. t Taketani S. - 

J. Biol. Chem. 268:21359-21363(1993). 
[ 3] Camadro J.M., Chambon H., Jolles J., Labbe P. 
Eur. J. Biochem. 156:579-587(1986). 
35 [ 4] XuK., Elliott T. 

J. Bacteriol. 176:3196-3203(1994). 

[0356] 99. Corona_nucleoca (Coronavirus nucleocapsid protein) 
[1] 

40 Medline: 98087828 

Identification of a specific interaction between the 
coronavirus mouse hepatitis virus A59 nucleocapsid protein 
and packaging signal. 
Molenkamp R, Spaan WJ; 
45 Virology 1997;239:78-86. 

Number of members: 44 

[0357] 100. Cu-oxidase (Multicopper oxidase) 
[1] 

Modlino: 90126844 
50 The blue oxidases, ascorbate oxidase, laccase and ceruloplasmin. 
Modelling and structural relationships. 
Messerschmidt A, Huber R; 
Eur J Biochem 1990;187:341-352. 
Number ot members: 150 

55 [0358] Mullicoppor oxidases ( 1 1 2] are enzymes that possess three spectroscopically different copper centers. These 
centers are called: type 1 (or blue), type 2 (or normal) and type 3 (or coupled binuclear). The enzymes that belong tc 
this family are: 
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Number of members: 24 c0 , lective |y termed cullins [1 J: 

a cell surface "ceplor 12] Uul «nicn ° , ran e«lor,. 
■ SSSS » -** ece >n cccerf * COO, an. UBC 3 ,0003., . — ~ 
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Mol. Cell. Biol. 16:6634-6643(1996). 

[0365] h02. (Cu_amine_oxid) 
Coppor amino oxidaso oignaluros 
5 Amine. oxidases (AO) [1] ar enzymes that catalyze the oxidation of a wide range of biogenic amines including many 
neurotransmitters, histamine and xenobiotic amines. There are two classes of amine oxidases: flavin-containing (EC 
1 .4.3.4) and coppor-containing (EC 1 .4.3.6). 

[0366] Copper-containing AO is found in bacteria, fungi, plants and animals, it is an homodimeric enzyme that binds 
one copper ion per subunit as well as a 2,4,5- trihydroxyphenylalanine quinone (or topaquinone) (TPQ) cofactor. This 

w cofactor is derived from a tyrosine residue. 

[0367] Two signature patterns were derived for copper AO, the first one contains the tyrosine which give rises to the 
TPQ cofactor while the second one contains one of the three histidines that bind the copper atom [2]. 
[0368] Consensus paltorn[LIVM)-[LIVMA]-|LIVMF]-x(4)-[ST]-x(2)-N-Y-[DE]-[YN] [The first Y gives rises to TPQ] Se- 
quences known to belong to this class detected by the patternALL 

is [0369] Consensus pattern T-x-[GS]-x(2)-H-[LIVMF]-x(3)-E-[DE]-x-P [H is a copper ligand] Sequences known to be- 
long lo thin clnoG dotoclod by tho pjittorn ALL, oxcopt for lontil AO. 

[ 1] Knowles P.F., Dooley D.M. (In) Metal ions in biological systems; Sigel H., Sigel A., Eds., 30:361-403, Marcel 
Dokkor, Now- York, (1993). 

20 [ 2] Parsons M.R., Convery M.A., Wilmot CM., Yadav K.D.S., Blakeley V., Corner A.S., Phillips S.E.V., McPherson 

M.J., Knowles P.F. Structure 3:1171-1184(1995). 

[0370] 103. Cys-protease (Cysteine protease) 
Number ol members: 358 

25 [0371] Eukaryotic thiol proteases (EC 3.4.22.-) [1] are a family of proteolytic enzymes which contain an active site 
cysteine. Catalysis proceeds through a thioester intermediate and is facilitated by a nearby histidine side chain; an 
asparagine completes the essential catalytic triad. The proteases which are currently known lo belong to this family 
ftro listod bolow (roforoncos aro only providod for rocontly determined soquences). 

30 - Vertebrate lysosomal cathepsins B (EC 3.4.22.1), H (EC 3.4.22.16), L (EC 3.4.22.15), and.S (EC 3.4.22.27) [2]. 
Vertebrate lysosomal dipeptidyl peptidase I (EC 3.4.14.1) (also known as cathepsin C) [2]. 

- Vertebrate calpains (EC 3.4.22.17). Calpains are intracellular calcium-activated thiol protease that contain both a 
N-lerminal catalytic domain and a C-terminal calcium-binding domain. 

Miiinmiilinn cMlhopoln K, which gooitig involvod in oslooclnstic bono rocorptton [3]. 
35 - Human cathepsin O [4]. 

Bleomycin hydrolase. An enzyme that catalyzes the inactivation of the antitumor drug BLM (a glycopeptide). 

- Plant enzymes: barley aleurain (EC 3.4.22.16), EP-B1/B4; kidney bean EP-C1. rice bean SH-EP; kiwi fruit actinidin 
(EC 3.4.22.14); papaya latex papain (EC 3,4,22.2), chymopapain (EC 3.4.22.6), caricain (EC 3.4.22.30), and pro- 
teinase IV (EC 3.4.22.25); pea turgor-responsive protein 15A; pineapple stem bromelain (EC 3.4.22.32); rape 

40 COT44; rice oryzain alpha, beta, and gamma; tomato low-temperature induced, Arabidopsis thaliana A494, RD1 9A 

andRD21A. 

House-dust mites allergens DerP1 and EurM1 . 

Cathepsin B-like proteinases from the worms Caenorhabditis elegans (genes gcp-1, cpr-3, cpr-4, cpr-5 and cpr- 
6), Schistosoma mansoni (antigen SM31) and Japonica (antigen SJ31), Haemonchus contortus (genes AC-1 and 
15 AC-2), and Ostertagia ostertagi (CP-1 and CP-3). 

Slime mold cysteine proteinases CP1 and CP2. 
Cruzipain from Trypanosoma cruzi and brucei. 

Throphozoite cysteine proteinase (TCP) from various Plasmodium species. 
Proteases from Leishmania mexicana, Theileria annulata and Theileria parva. 
50 - Baculoviruses cathepsin-like enzyme (v-cath). 

Drosophila small optic lobes protein (gene sol), a neuronal protein that contains a calpain-like domain. 

- Yeast thiol protease BLH1/YCP1/LAP3. 

- ~ Caenorhabditis elegans hypothetical protein C06G4.2, a calpain-like protein. 
55 [0372] Two bacterial peptidases are also part of this family: 

Aminopeptidase C from Lactococcus lactis (gene pepC) [5]. 
Thiol protease tpr from Porphyromonas gingivalis. 
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1 Lure 340:604-604(1989) 

1 7] Rawlings N.D., Barrett A J _ 
1 Moth. Enzymol. 244:461-486(1994). 

, Met Meta PP (Cys/Met metabo.ism PLP-dependent enzyme) 
[037S] 104. Cys_Met_Meta_K^ II Escherichia col. at 1 .83 > 

nl Medline: 96428687 ^ . 5 , phosphate dependent cystathion.ne beta-.yase 

j MolB'iol 1996;262:202-224. 
m Medline: coli cystathionine gamma-syothase at 1 .5 A reso.ut.on. 

C SH n ? Huber R. Prade L, Wah, MC. Messerschm.cn A; 
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- Cystathionine gamma-lyase (EC 4.4.1.1 ) (gamma-cystathionase), which catalyzes the transformation ot cystath- 
ionine into cysteine, oxobutanoate and ammonia. This is the final reaction in the transulf uration pathway that leads 
from methionine to cysteine in eukaryotes. 

- cyniMlhlo.ti.io unmmn.oynlhnao (EC A. 2,09.0), which cntnlyzootho conversion of cystoino and succtnyl-homosor- 
o (no into cyctnthlonino nnd ouccliuilo: Iho first Gtop in tho biosynthosls ol mothionino from cystoino- in bacteria (gone 

- metB). ' . 

- Cystathionine beta-lyase (EC 4.4.1.8) (beta-cystathionase), which catalyzes the conversion of cystathionine into 
homocysteine, pyruvate and ammonia: the second step in the biosynthesis of methionine from cysteine in bacteria 

(gone metC). , . . , 

io - Mothionino <jnmmn-lynco (EC AAA .11) (L-mothioninnso) which calalyzos tho transformation ol mothionino into 

molhanothlol, oxobulanoato and ammonia. 

- OAH/OAS sulfhydrylase, which catalyzes the conversion ot acetylhomoserine into homocysteine and that of ace- 
tylsorine into cysteine (gene MET17 or MET25 in yeast). 

O-succinylhomoserine sulfhydrylase (EC 4.2.99.-). 
16 - Yoast hypothetical protein YGL1 84c. 

Yofiet hypothetical protoln YHR1 1 2c. 

[0377] These enzymes are proteins of about 400 amino-acid residues. The pyridoxal-P group is attached to a lysine 
rosiduo located in tho conlral section of those enzymes; the sequence around this residue is highly conserved and can 
20 be used as a signature pattern to detect this class of enzymes. 

- Consensus pattorn: [DQ]:[LIVMF]-x(3)-[STAGC].lSTAGCI]-T-K-[FYWQ]-[LIVMF).x-G.lHQ]-[SGNH] [K is the pyri- 
doxal-P attachment site] 

25 [ 1] Ono B.I., Tanaka K., Naito K., Heike C, Shinoda S., Yamamoto S.. Ohmori S., Oshima T, Toh-E A. J. Bacterid. 

174*3339-3347(1992). 

| 2] Barton A.B.. Kaback D.B.. Clark M.W., Keng T, Ouelletle B.F.F., Storms R.K.. Zeng B., Zhong W.W., Fortin 
N. ( Dolanoy S. ( Busaoy H. Yoasl 9:363-369(1993). 

30 [0378] 105. Cyt_reductase . 

FAD/NAD-binding Cytochrome reductase 
Number of members: 60 

[1] Medline: 95111952 ' . 4U 

Crystal structure ot tho FAD-containing fragment ot corn nitrate reductase at 2.5 A resolution: relationship to other 

35 flavoprotein reductases. 

Lu G, Campbell WH, Schneider G, Lindqvist Y; 
Structure 1994;2:809-821. 

[2] Medline: 92084635 ^ 
Tho soquonco ot squash NADH:nitralo reductase and its relationship to the sequences of other flavoprotein oxidore- 
AO ductases. A family of flavoprotein pyridine nucleotide cytochrome reductases. 
Hyde GE, Crawford NM, Campbell WH; 

J Biol Chem 1991;266:23542-23547. 
[0379] 106. Cytidylyltrans 
Phosphatidate cytidylyltransf erase 
4& Number of members: 21 . 

[0380] Phosphatidate cytidylyltransterase (EC 2.7.7.41 ) [1 ,2,3] (also known as CDP-diacylglycerol synthase) (CDS) 
is the enzyme that catalyzes the synthesis of CDP-diacylglycerol from CTP and phosphatidate (PA). CDP-diacylglycerol 
is an important branch point intermediate in both prokaryotic and eukaryotic organisms. CDS is a membrane-bound 
enzyme. A conserved region located in the C-terminal pari has been selected as a signature pattern. 


so 


Consensus pattern: S-x-[LlVMF]-K-R-x(4)-K-D-x-[GSA]-x(2)-[LI]-[PG]-x-H-G-G-[LIVM]-x-D-R-ILIVMF]-D 


^ [ 1] Sparrow CP., Raetz C.R.H. 
' J. Biol. Chem. 260:12084-12091(1985). 
55 [ 2] Shen H., Heacock P.N., Clancey C.J., Dowhan W. 

J. Biol. Chem. 271:789-795(1996). 
[ 3] Saito S., Goto K, Tonosaki A., Kondo H. 
J. Biol. Chem. 272:9503-9509(1997). 
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Desmoglein 1 (desmosomal glycoprotein I). 

Desmoglein 2. • , 

Desrhoglein 3 (Pemphigus vulgaris antigen). 

5 [0387] Tho Drosophila lat protein [3] is a huge protein of over 5000 amino acids that contains 34 cadher in-like r peats 
in- its extracellular domain. 

[0388] The signature pattern that was developed tor the repeated domain is located in it the C-terminal extremity 
which is its best conserved region. The pattern includes two conserved aspartic acid residues as well as two aspar- 
agines; these residues could be implicated in the binding ot calcium. 
to |0389] Consensus patlornlLI V]-x-|LI V]-x-D-x-N-D-[NH]-x-P Sequences known to belong to this class detected by the 
pattern ALL. Note this pattern is tound in the first, second, and fourth copies of the repeated domain. In the third copy 
there is a deletion of one residue after the second conserved Asp. 

[ 1] Takeichi M. Annu. Rev. Biochem. 59:237-252(1990). 
is [ 2] Takeichi M. Trends Genet. 3:213-217(1987). 

| 3] Mahonoy PA., Weber U., Onofrechuk P., Biessmann H., Bryant P.J., Goodman C.S. Cell 67:853-868(1 991). 

[0390] 110. Calreticulin family signatures 

Calroticulin [1 ] (also known as calregulin, CRP55 or HACBP) is a high-capacitycalcium-binding protein which is present 

20 in most tissues and located at the periphery of the endoplasmic (ER) and the sarcoplamic reticulum (SR)membranes. 
It probably plays a rolo in tho storage ot calcium in the lumen oflho ER and SR and it may well have other important 
functions. Structurally, CHlroticulin is a protein of about 400 amino acid rosiduos consisting of three domains: a) An N- 
terminal, probably globular, domain of about 180 amino acid residues (N-domain); b) A central domain ot about 70 
residues (P-domain) which contains three repeats ot an acidic 17 amino acid motif. This region binds calcium with a 

25 low-capacity, but a high-affinity; c) A C-terminal domain rich in acidic residues and in lysine (C-domain). This region 
binds calcium with a high-capacity but a low-affinity. Calreticulin is evolutionary related to the following proteins: - 
Onchocerca volvulus antigen RAL-1. RAL-1 is highly similar to calreticulin, but possesses a C-terminal domain rich in 
lytiino «nd wginino and kicks acidic rosiduos and is thoroforo not oxpoctod to bind calcium in that region. - Calnexin 
[2]. A calcium-binding protein that interacts with newly synthesized glycoproteins in the endoplasmic reticulum. It seems 

30 to play a major role in the quality control apparatus of the ER by the retention of incorrectly folded proteins. - Calmegin 
[3] (or calnexin-T), a testis-specific calcium-binding protein highly similar to calnexin. Three signature patterns have 
been developed for this family of proteins. The first two patterns are based on conserved regions in the N-domain; the 
third pattern corresponds to positions 4 to 16 of the repeated motif in the P-domain. 
Consensus pattern: [KRHN]-x-[DEQN]-[DEQNK]-x(3)-C-G-G-[AGHFYHLIVM]-[KN]-[LIVMFY](2)- 

35 Consensus pattern: [LIVM](2)-F-G-P-D-x-C-[AG]- 

Consensus pattern: [IV]-x-D-x-|DENST]-x(2)-K-P-[DEH)-D-W-[DEN]- 

[ 1] Michalak M., Milner R.E., Burns K., Opas M. Biochem. J. 285:681-692(1992). 

[ 2] Bergeron J.J.M., Brenner M.B., Thomas D.Y., Williams D.B. Trends Biochem. Sci. 19:124-128(1994). 
40 [ 3] Watanabe D., Yamada K., Nishina Y, Tajima Y., Koshimizu U., Nagata A., Nishimune Y. J. Biol. Chem. 269: 

7744-7749(1994). 

[0391] 111. Eukaryotic-type carbonic anhydrases signature (carb_anhydrase) 

Carbonic anhydrases (EC 4.2.1.1 ) (CA) [1,2,3,4] are zinc metalloenzymes which catalyze the reversible hydration ot 
45 carbon dioxide. Eight enzymatic and evolutionary related forms of carbonic anhydrase are currently known to exist in 
vertebrates: three cytosolic isozymes (CA-I, CA-II and CA-III); two membrane-bound forms (CA-IV and CA-Vll); a 
mitochondrial form (CA-V); a secreted salivary form (CA-VI); and a yet uncharacterized isozyme [5].ln the alga 
Chlamydomonas reinhardtii, two CA isozymes have been sequenced[6]. They are periplasmic glycoproteins evolu- . 
tionary related to vertebrate CAs. Some bacteria, such as Neisseria gonorrhoeae [7] also have a eukaryotic-type CA. 
50 CAs contain a single zinc atom bound to three conserved histidine residues. As a signature for CAs, a pattern has 
been developed which includes one of these zinc-binding histidines. Protein D8 from Vaccinia and other poxviruses is 
related to CAs but has lost two of the zinc-binding histidines as well as many otherwise conserved residues. This is 
also true of the N-terminal extracellular domain of some receptor-type tyrosine-protein phosphatases (see 
<PDOC00323 >). 

55 Consensus pattern: S-E-[HN]-x-[LIVM]-x(4)-[FYH]-x(2)-E-[LIVMGA]-H-tLIVMFA](2) [The second H is a zinc ligand]- 
Note: mostprokaryotic CA's as well as plant chloroplast CA's belong to another, evolutionary distinct family of proteins 
(see <PDOC00586 
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[ 1J Deutsch H.F. Int. J. Biochem. 19:101-113(1987). 
[ 2] Fernloy R.T. Trends Biochem. Sci. 13:356-359(1988). 
[ 3] Tashian R.E. BioEssays 10:186-192(1989). 
[ 4] Edwards Y Biochem. Sex:. Trans. 18:171-175(1990). 
5 [ 5] Skaggs LA. : Bergenhem N.C.H., Venta P.J., Tashian R.E. Gene 126:291-292(1993). 

[ 6] Fujiwara S., Fukuzawa H., Tachiki A., Miyachi S. Proc. Natl. Acad. Sci. U.S.A. 87:9779-9783(1990). 

[ 7] Huang S., Xue Y. t Sauer-Eriksson E., Chirica L, Lindskog S.,Jonsson B.H. 2.3.00.2-^. Mol. BioL 283:301-310 " 

(1998). 

w [0392] 112. Caseins alpha/beta signature 

Caseins [1] are the major protein constituent of milk. Caseins can be classified into two families; the first consists of 
the kappa-caseins, and the second groups the alpha-si, alpha-s2, and beta-caseins. The alpha/beta caseins are a 
rapidly diverging family of proteins. However two regions are conserved: a cluster of phosphorylated serine residues - 
and the signal sequence. The signature pattern has been developed for this family of proteins based upon the last 

*s eight residues of the signal sequence. 

Consensus pattern: C-L-[LV]-A-x-A-[LVF]-A- 

[1] Holt C., Sawyer L Protein Eng. 2:251-259(1988). 

[0393] 113. Catalase signatures 

Catalase (EC 1.11. 1.6 ) [1,2,3] is an enzyme, present in all aerobic cells.that decomposes hydrogen peroxide to mo- 
20 lecular oxygen and water. Its main function is to protect cells from the toxic effects of hydrogen peroxide. In eukaryotic 
organisms and in some prokaryotes catalase is a molecule composed of four identical subunits. Each of the subunits 
binds one protoheme IX group. A conserved tyrosine serves as the heme proximal side ligand. The region around this 
residue has been used as a first signature pattern; it also includes a conserved arginine that participates in heme- 
binding. A conserved histidine has been shown to be important for the catalytic mechanism of the enzyme. The region 
2S around this residue has been selected as a second signature pattern. - 

Consensus pattern: R-[LIVMFSTAN]-F-|GASTNP]-Y-x-D-[AST]-[QEH] [Y is the proximal heme-binding ligand] 
Consensus pattern: [IF]-x-[RH]-x(4)-[EQ]-R-x(2)-H-x(2)-[GAS]-[GASTF]-[GAST] [H is an active site residue] 
Note: some prokaryotic catalases belong to the peroxidase family (see <PDOC00394 >). 

30 [ 1] Murthy M.R.N., Reid T.J. Ill, Sicignano A., Tanaka N. ( Rossmann M.G. J. Mol. Biol. 152:465-499(1981). 

[ 2] Melik-Adamyan W.R., Barynin V.V., Vagin A. A., Borisov V.V, Vainshtoin B.K., Fita I., Murthy M.R.N. , Rossmann 
M.G. J. Mol. Biol. 188:63-72(1986). 

[ 3] von Ossowki I., Hausner G., Loewen PC. J. Mol. Evol. 37:71-76(1993). 

35 [0394] 114. (chitin binding) Chitin recognition or binding domain signature 

A conserved domain of 43 amino acids is found in several plant and fungal proteins that have a common binding 
specificity for oligosaccharides of N-acetylglucosamine [1]. This domain may bo involved in the recognition or binding 
of chitin subunits. It has been found in the proteins listod below. - A number of non-leguminous plant lectins. The best 
characterized of these lectins are the three highly homologous wheat germ agglutinins (WGA-1 , 2 and 3). WGA is an 

40 N-acetylglucosamine/N-acetylneuraminic acid binding lectin which structurally consists of a fourfold repetition of the 
43 amino acid domain. The same type of structure is found in a barley root-specific lectin as well as a rice lectin. - 
Plants endochitinases (EC 3.2.1.14 ) from class I A (see < PDOC00620 >). Endochitinases are enzymes that catalyze 
th hydrolysis of the beta-1 ,4 linkages of N-acetyl glucosamine polymers of chitin. Plant chitinasos function as a dofonse 
against chitin containing fungal pathogens. Class IA chitinases generally contain one copy of the chitin-binding domain 

45 at their N-terminal extremity. An exception is agglutinin/chitinase [2] from the stinging nettle Urtica dioica which contains 
two copies of the domain. - Hevein [5], a wound-induced protein found in the latox of rubbor troos. - Win1 and win2, 
two wound-induced proteins from potato. - Kluyveromyces lactis killer toxin alpha subunit [3]. The toxin encoded by 
the linear plasmid pGKL1 is composed of three subunits: alpha, beta, and gamma. The gamma subunit harbors toxin 
activity and inhibits growth of sensitive yeast strains in the G1 phase of the coll cyclo; the alpha subunit, which is 

50 proteolytically processed from a larger precursor that also contains the beta subunit, is a chitinase (see <PDOC00839>). 
In chitinases, as well as in the potato wound-induced proteins, the 43-residuedomain directly follows the signal se- 
quence and is therefore at the N-terminal of the mature protein; in the killer toxin alpha subunit it is located in the central 
section of the protein. The domain contains eight conserved cysteine residues which have all been shown, in WGA, 
to be involved in disulfide bonds. The topological arrangement of the four disulfide bonds is shown in the following 

55 figure: +— ++— -I +IIIII 

xxCgxxxxxxxCxxxxCCsxxgxCgxxxxxCxxxCxxxxC + -+'C': conserved cysteine in- 

volved in a disulfide bond.***: position of the pattern. 
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- Consensus pattern: C-x(4 ( 5)-C-C-S-x(2)-G-x-C-G-x(4)-[FYW]-C [The five C's are involved in disulfide bonds] 

[ 1] Wright H.T., Sandrasegaram G., Wright C.S. J. Mol. Evol. 33:283-294(1991). 
| ;;| Lomot D,R, ( Rnlkhnl N,V. J. Biol, Chorn. 267:11006-11091(1992). 
5 [3] Butler A.R., ODonnol R.W., Martin V.J.. Gooday G.W., Stark M.J.R. Eur. J. Biochem. 199:483-488(1991). 

[0395] 115. (Chitinase 1) Chitinases family 19 signatures 

Chitinasos (EC 3,2.1.14) 111 are enzymes that catalyze the hydrolysis of thebeta-1 ,4-N-acetyl-D-glucosamine linkages 
in chitin polymers. From the viewpoint of sequence similarity chitinases belong to either family 18 or 19 in the classi- 

w ncntion ol fjlyconyl hycirolrinoo |2,E1]. Chilinasos ol family 19(also known as classes IA or I and IB or II) are enzymes 
from plants that junction in the defense against fungal and insect pathogens by destroying their chitin<ontaining cell 
wall. Class I A/I and IB/II enzymes differ in the presence (IA/I) or absence (IB/II) of a N-terminal chitin-binding domain 
(soothe rolovant entry < PDOC00025 >). The catalytic domain of these enzymes consist of about 220 to 230 amino acid 
residues. Two highly conserved regions have been selected as signature patterns, the first one is located in the N- 

is terminal section and contains one of the six cysteines which are conserved in most, if not all, of these chitinases and 
which is probably involvod in a disulfide bond. 

Consensus pattern: C-x(4,5)-F-Y-[ST]-x(3)"[FY]-[LIVMF]-x-A-x(3)-[YF]-x(2)-F- |GSA] 
Consensus pattern: (LIVM]-[GSA]-F-x-[STAG](2)-[LIVMFY]-W-[FY]-W-ILIVM] 

20 [1] Ftach J., Pilet P-E., Jolles P. Experientia 48:701-716(1992). 

[ 2] Henrissat B. Biochem. J. 280:309-316(1991). 

[0396] 116. chloroa_b-bind 

Chlorophyll A-B binding proteins. Number of members: 211 
25 [0397] 117. chromo 

The 'chromo 1 (CHRromatin Organization Modifier) domain [1 to 4] is a conserved region of about 60 ammo acids which 
was originally found in Drosophila modifiers of variegation, which are proteins that modily the structure of chromatin 
to the condonsod morphology of hotorochromatin, a cytologically visible condition where gene expression is repressed. 
In protein Polycomb, the chromo domain has been shown to be important lor chromatin targeting. Proteins that contains 
30 a chromo domain seem to fall into three classes: 

a) Proteins which have a N-terminal chromo domain followed by a region which is related to but distinct from the 
chromo domain and which has been termed [3J the 'chromo shadow' domain. 

b) Proloins with a single chromo domain. 

35 c) Proteins with paired tandem chromo domains. 

0398] Currently, this domain has been found in the following proteins: 
0399] Class A. x 

Drosophila hotorochromatin protoin Su(var)205 (HP1). 
Human heterochromatin protein HP1 alpha. 
Mammalian modifier 1 and modifier 2. 

Fission yeast swi6, a protein involved in the repression of the silent mating-type loci mat2 and mat3. 
0400] CIooq B. 

Drosophila protein Polycomb (Pc). 
Mammalian modifier 3, a homolog of Pc. 

Drosophila protein Su(var)3-9, a suppressor of position-effect variegation. 
Human Mi-2 autoantigen, characterisitic of dermatornyosis. 

Fungal rotrotranposon polyprotoins: 'skippy' from Fusarium oxysporum, 'grasshopper' and 'MAGGY' from Mag- 
naporthe grisea and CfT-1 from Cladosporium fulvum. 
Fission yeast hypothetical protein SpAC18G6.02c. 
Caenorhabditis elegans hypothetical protein C29H1 2.5 
Caenorhabditis elegans hypothetical protein ZK1 236.2. 
Caenorhabditis elegans hypothetical protein T09A5.8. 

0401] Class C. 
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Mammalian DNA-binding/helicase proteins CHD-1 to CHD-4. 
Yeast protein CHD1. 

[0402] The signature pattern for this domain corresponds to its best conserved section, which is located in its central 
5 part. 

- Consensus pattern: [FYL]-x-[UVMCHKR]-W-x-[GDNR]-[FYWL^ 

[ 1] ParoR. Trends Genet. 6:416-421(1990). 
io [2] Singh P.B., Miller J. R., Pearce J., Kothary R., Burton R.D., Paro R., James T.C., Gaunt SJ. Nucleic Acids Res. 

19:789-794(1991). 

[ 3] Aasland R., Stewart A.F. Nucleic Acids Res. 23:3168-3173(1995). 

[ 4] Koonin E.V., Zhou S., Lucchesis J.C. Nucleic Acids Res. 23:4229-4233(1995). 

is [0403] 118. citrate_synt 

Citrate synthase (EC 4.1.3.7) (CS) is the tricarboxylic acid cycle enzyme that catalyzes the synthesis of citrate from 
oxaloacetate and acetyl-CoA in an aldol condensation. CS can directly form a carbon-carbon bond in the absence of 
metal ion cofactors. 

[0404] In prokaryotes, citrate synthase is composed of six identical subunits. In eukaryotes, there are two isozymes 
20 of citrate synthase: one is found in the mitochondrial matrix, the second is cytoplasmic. Both seem to be dimers of 
identical chains. 

[0405] There are a number of regions of sequence similarity between prokaryotic and eukaryotic citrate synthases. 
One of the best conserved contains a histidine which is one of three residues shown [1] to be involved in the catalytic 
mechanism of the vertebrate mitochondrial enzyme. This region has been used as a signature pattern. 

25 

Consensus pattern: G-[FYA]-[GA]-H-x-[IV)-x(1 ,2)-[RKT]-x(2)-D-[PS]-R [H is an active site residue] 

[0406] [1] Karpusas M. ( Branchaud B., Remington S.J. Biochemistry 29:2213-2219(1990). 
[0407] 119. clpA_B 
30 Chaperonin cIpA/B 

CAUTION! This family is a subfamily of the AAA superfamily. The threshold has boon sot very high to stop ovorlaps 
with the AAA superfamily. This entry will be subsumed by AAA in the future. 
Number of members: 39 

[0408] A number of ATP-binding proteins that are are thought to protect cells from extreme stress by controlling the 
35 aggregation of denaturation of vital cellular structures have been shown [ 1 ,2] to be evolutionary related. These proteins 
are listed below. 

Escherichia coli clpA, which acts as the regulatory subunit of the ATP-dependent protease dp. 
Rhodopseudomonas blastica clpA homolog. 
to . Escherichia coli heat shock protein dpB and homologs in other bacteria. 
Bacillus subtilis protein mecB. 

Yeast heat shock protein 104 (gene HSP104), which is vital for tolerance to heat, ethanol and other stresses. 
Neurospora heat shock protein hsp98. 
Yeast mitochondrial heat shock protein 78 (gene HSP78) [3]. 
45 . CD4A and CD4b ( two highly related tomato proteins that seem to be located in the chloroplast. 
Trypanosoma brucoi protoin dp. 
Porphyra purpurea chloroplast encoded cIpC. 

[0409] The size of these proteins range from 84 Kd (clpA) to slightly more than 100 Kd (HSP104). They all share 
50 two conserved regions of about 200 amino acids that each contains an ATP-binding site. In addition to the ATP-binding 
A and B motifs there are many parts in these two domains that are also conserved. Two of these regions have been 
selected as signature patterns. The first signature is located in the first domain, some ten residues to the C-terminal 
of the ATP-binding B motif. The second pattern is located in the second domain in-between the ATP-binding A and B 
motifs. 

55 

- Consensus pattern: D-[AI]-[SGA]-N-[LIVMF](2)-K-IPT]-x-L-x(2)-G 1 

- Consensus pattern: R-[LIVMFY]-D-x-S-E-[LIVMFY]-x-E-[KRQ)-x-[STA]-x-[STA]-[KR]-[LIVM]-x-G-[STA] 
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PI 

Nal Struct Biol 1997;4:356-369. 
12] 

Modlino: 97290450 „clo„ho.ln Horn Acnnthnmoob.., 

'TMBi'ESSSKK P *" TD ' """" EE; 

Nuemcl Biol 1997.4:369-373. 
[4] 

So P— rapid actin filament turnover in vivo. 
Lappalainen P, Drubin DG; 

Nature 1997 ^ 8:7 f' 8 a L binds to aclin monomers. 
Severs actin filaments and binds to ac mono mers or G-actin, thus 

Number of members: 44 filaments (F-actin) *«Vor b md to act ' related and 

belong to a lami.y o. iow molecular we.gnt < pH -dependenl actin-depo- 

»« ap— e ,a^. 
. P ,ants actin deoolymenz.ng factor (ADF). ^ 
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ergetic enzyme complex there is one with a molecular weight of 24 Kd (in mammals), which is a component of the iron- 
sulfur (IP) fragment of the enzyme. It seems to bind a2Fe-2S iron-sulfur cluster The 24 Kd subunit is nuclear encoded, 
as aprecursor form with a transit peptide in mammals, and in Neurospora crassa.The 24 Kd subunit is highly similar 
to [3,4]: - Subunit E of Escherichia coli NADH-ubiquinone oxidoreductase (gene nuoE), - Subunit NQ02 of Paracoccus 
5 denitriticans NADH-ubiquinone oxidoreductase. A highly conserved region, located in the central section of ithis subunit - 
containing two conserved cysteines that are probably involved in the binding of the 2Fe-2S center has been selected 
as a signature pattern. 

- Consensus pattern: D-x(2)-F-[ST]-x(5)-C-L-G-x-C-x(2) [GA]-P [The two C's are putative 2Fe-2S ligands] 

70 

[ 1] Ragan C.I. Curr. Top. Bioenerg. 15:1-36(1987). 

[ 2] Weiss H., Friedrich T, Hofhaus G. ( Preis D. Eur. J. Biochem 197:563-576(1991). 
[ 3] Fearnley I.M., Walker J.E. Biochim. Biophys. Acta 1140:105-134(1992). 

[ 4] Weidner U., Geier S., Ptock A., Friedrich T, Leif H., Weiss H.- J. Mol. Biol. 233:109-122(1993). 

15 

[0414] 122. copper-bind 

Copper binding proteins, plastocyanin/azurin family 
Number of members: 70 

[0415] Blue or 'type- V copper proteins are small proteins which bind a single copper atom and which are character- 
20 ized by an intense electronic absorption band near 600 nm [1 ,2], The most well known members of this class of proteins 
are the plant chloroplastic plastocyanins, which exchange electrons with cytochrome c6 ( and the distantly related bac- 
terial azurins, which exchange electrons with cytochrome c551. This family of proteins also includes all the proteins 
listed below (references are only provided lor recently determined sequences). 

25 - Amicyanin from bacteria such as Methylobacterium extorquens or Thiobacillus versutus that can grow on methyl- 
amine. Amicyanin appears to be an electron receptor for methylamine dehydrogenase. 

Auracyanins A and B from Chloroflexus aurantiacus [3]. These proteins can donate electrons to cytochrome c-554. 
Blue copper protein from Alcaligenes faecalis. 
Cupredoxin (CPC) from cucumber peelings [4]. 
30 - Cusacyanin (basic blue protein; plantacyanin, CBP) from cucumber. 

Halocyanin from Natrobacterium pharaonis [5], a membrane associated copper-binding protein. 
Pseudoazurin from Psoudomonas. 

Rusticyanin from Thiobacillus ferrooxidans. Rusticyanin is an electron carrier from cytochrome c-552 to the a-type 
oxidase [6). 

35 - Stellacyanin from the Japanese lacquer tree. 
Umecyanin from horseradish roots. 

Allergen Ra3 from ragweed. This pollen protein is evolutionary related to the above proteins, but sooms to have 
lost the ability to bind copper. 

40 

[041 6] Although there is an appreciable amount of divergence in the sequence of all these proteins, the copper ligand 
sites are conserved and a pattern which includes two of the ligands (a cysteine and a histidine) has been developed. 

- Consensus pattern: [GA]-x(0,2)-[YSA]-x(0,1)-[VFY]-x.C-x(1,2HPG]-x(0 l 1)-H^x(2 ( 4HMQ] [C and H are copper llg- 
45 ands] 

[ 1] Garret T.P.J., Clingeleffer D.J., Guss J.M., Rogers S.J., Freeman H.C. J. Biol. Chem. 259:2822-2825(1984). 
[ 2] Ryden L.G., Hunt LT J. Mol. Evol. 36:41-66(1993). 

[ 3] McManus J.D., Brune D.C., Han J., Sanders-Loehr J. ( Meyer T.E., Cusanovich M.A., Tollin G., Blankenship R. 
so E. J. Biol. Chem. 267:6531-6540(1992). 

[ 4] Mann K., Schaefer W., Thoenes U., Messerschmidt A., Mehrabian Z., Nalbandyan R. FEBS Lett. 314:220-223 
(1992): 

[ 5] Mattar S., Scharf B., Kent S.B.H., Rodewald K., Oesterhelt D., Engelhard M. J. Biol, Chem. 269:14939-14945 
(1994). 

55 [ 6] Yano T, Fukumori Y, Yamanaka T. FEBS Lett. 288:159-162(1991). 

[041 7] 1 23. Chaperonins cpnl 0 signature 

Chaperonins [1,2] are proteins involved in the folding of proteins or the assembly of oligomeric protein complexes. 
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l0418] l^ape^nsc^ 
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Their role seems to be to ass st °^r P° yP P abundan ce .n prokaryotes ; <*™ P Kd , ein , knoW rvas 

assembly intooligomeric «^™™>™ Composed ot two dil.erent types ° 60 protei n shows weak 

3 n oninn Lm olicprnnric <»^~™ ' ™Z known as cpnlO (groE Mnbe er a).Tb P^ ^ ^ d ^ 
c P n60 (groEL in bactena) and a 10 Kd P ^ ^ ^ , o 580 am.no ac.d res ^ ba 

ATPase activity and is a h.ghly con >erve P groEL prote.n, which » e ss eniia ^ tube rculos,s and 

by dillo-on. names in dilleron ^^S^cyBnobB^rlal groEL anaguej^ Jjjam-W maior antigen 58 
and the assembiy o. ^"^^^^^ B (9 T T t SSS— binding-protein alpha and 

Consensus patlorn:MASl-x-lUtu) „ 7n99 i) 

[ 2] Zeilsta-Byalls J., t-ay ei ^- 
, 0419] Cha P eroninsTCP-lsigna = ^ 

The TCP-1 protein H .2] (ft*- C <^ "Jen lound and characterized .n m "V 0 e whjch participa ,es 

Lstis but present in ail ce.i types, it has s.n^^ , ^» KdJ^ MO ^ - t he 

DroS ophila and in yea*. J^^J^ sha ped partic.e 13) w,th 6 to 8 other drf relale d to TCP- 

in a hetero-or.gomer.c900 Kd d ™ beta ga mma,delta, eps.lon, zeta , ana e other prote ,n«, 

chaperonin containing TCP-1 ^f^^^eJar chaperone tor tubuhn. acl;n P» J & chap . 

1 itself l4.5).The CCT is known to ac as a mo clorial counterparts - ^5 and .^^ ^ ^ g 

0420] The CCT auburn* aro highly <°™° is known to b.nd untolded po y P P m The , her 

related to thecpn60/groEL *f TJ^, domain were chosen 
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\% r:rv^Hynes G.M., Zheng D.. Saibi. H.. Wi.l.son K.R. 
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arrsc > — - - - - ™ - ih - BB ^ 

HI 

TFIIB and RB/pl07. Blocker A Kouzarides T; 

Gibson TJ. Thompson JD. Blocker a. 
Nucleic Acids Res 1994.22.946-952. 

ts 12] 

Medline: 96164440 

Mitchell E, Rasmussen B Hunt i. 
structure 1995,3:1235-1247. 
Complex ouyclin and cyc.in dependant k.nase. 

[3) 

Rus «oAA.Joii.syPD. p8 * ,i 5j. NP; 

S S^^-^^ & "^ ^ lMPF> ™~ "* ~ 

a.atfy **, G2 *'£S£J£. c.» o»C »» 3,/s (s«1 «"•»*»• 

[04251 The best conserved region is . 

I 2] Norbury C, Nurse k ouii. 2*77-81(1992). 

' * Le ": D , J '."SI'S .SSS R« Nature «»W»C««0. 

[ 4] Nicholas J.. Cameron r\ n., 
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terminal end is very divergent B ^ d 9 Number o1 Members: 147 anri bodv fluids of animals, in the larva 
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- Type 1 cystatins (or stefins), molecules of about 100 amino acid residues with neither disulfide bonds nor carbo- 
hydrate groups. 

- Type!2 cystatins, molecules of about 115 amino acid residues which contain one or two disulfide loops near their 
OloiminuG. 

Kininogens, which are multifunctional plasma glycoproteins. 

[0428] They are the precursor of the active peptide bradykinin and play a role in blood coagulation by helping to 
position optimally prekallikrein and factor XI next to factor XII. They are also inhibitors of cysteine proteases. Structurally, 
kininogens are made of three contiguous type-2 cystatin domains; followed by an additional domain (of variable length) 
which contains tho soquonco of bradykinin. Tho lirst of the throe cystatin domains seems to have lost its inhibitory 

activity. , 

[0429] In all these inhibitors, there is a conserved region of five residues which has been proposed to be important 

for tho binding to the cysteine proteases. The consensus pattern starts one residue before this conserved region. 

- Consensus pattern: [GSTEQKRV]-Q- |Ll^]-IVAF]-[SAGQ]-G-x-[LIVMNK]-x(2)-[LIVMFY]-x-[LIVMFYAHDEN- 
QKRHSIV) 

[1] Barrett A.J. Trends Biochem. Sci. 12:193-196(1987). 
[2] Rawlings N.D., Barrett A.J. J. Mol. Evol. 30:60-71(1990). 
[3] Turk V., Bode W. FEBS Lett. 285:213-219(1991). 

M Lustigman S., Brotman B., Huima T, Prince A.M. Mol. Biochem. Parasitol. 45:65-76(1991). 

[0430] 127. cytochromes (Cytochrome c) 

Tho Plam onlry doos not include all prosite members. 

The cytochrome 556 and cytochrome c' families are not included. 
Number of members: 259 

[04311 In proteins belonging to cytochrome c family [1], the heme group is covalently attached by thioether bonds to 
two conserved cysteine residues. The consensus sequence lor this site is Cys-X-X-Cys-His and the hist.d.ne residue 
is one of the two axial ligands of the heme iron. This arrangement is shared by all proteins known to belong to cyto- 
chrome c family, which presently includes cytochromes c, C, d to c6, c550 to c556, cc3/Hmc, cytochrome f and reaction 
center cytochrome c. 

- Consensus pattern: C-{CPWHF}-{CPWR]-C-H-{CFYW} 

[0432] [ 1] Mathews F.S. Prog. Biophys. Mol. Biol. 45:1-56(1985). 

[0433] 128. (DAGKa) Diacylglycerol kinase accessory domain (presumed) 

[0434] Diacylglycerol (DAG) is a second messenger that acts as a protein kinase C activator This domain is assumed 
to be an accessory domain: its function is unknown. 

[0435] [1] Sakane F, Yamada K. Kanoh H. Yokoyama C, Tanabe T, Nature 1990;344:345-348.[2] Sakane F, Imai S. 
Kai M Wada I Kanoh H, J Biol Chem 1996;271:8394-8401 .[3] Schaap D. de Widt J, van der Wal J, Vandekerckhove 
J, van, Damme J, Gussow D, Ploegh HL. van Blitterswijk WJ, van der, Bend RL, FEBS Lett 1990;275:151-158. [4] 
Kanoh H, Yamada K, Sakane F, Trends Biochem Sci 1990;15:47-50. 
[0436] 129 (DAGKc) Diacylglycerol kinase catalytic domain (presumed) 

[0437] Diacylglycerol (DAG) is a second messenger that acts as a protein kinase C activator. The catalytic domain 
is assumed from the finding of bacterial homologues. 

[0438] [1] Sakane F. Yamada K, Kanoh H, Yokoyama C, Tanabe T, Nature 1990;344:345-348. [2] Sakane F, Imai S. 
Kai M Wada I Kanoh H, J Biol Chem 1996;271:8394-8401. [3] Schaap D, de Widt J, van der Wal J, Vandekerckhove 
J, van, Damme J, Gussow D, Ploegh HL, van Blitterswijk WJ, van der, Bend RL, FEBS Lett 1990;275:151-158. [4], 
Kanoh H, Yamada K, Sakane F, Trends Biochem Sci 1990;15:47-50. 
[0439] 130 D-amino acid oxidases signaturo(DAO) 

[0440] D-amino acid oxidase (EC 1.4.3.3 ) (DAMOX or DAO) is an FAD flavoenzyme that catalyzes the oxidation of 
neutral and basic D-amino acids into their corresponding keto acids. DAOs have been characterized and sequenced 
ift fungi and vertebrates where they are known to be located in the peroxisomes. D-aspartate oxidase (EC 1 . .4.3. 1) 
(DASOX) [1] is an enzyme, structurally related to DAO, which catalyzes the same reaction but is active only toward 
dicarboxylic D-amino acids. In DAO, a conserved histidine has been shown [2] to be important tor the enzyme's catalytic 
activity Tho consorvod rogton around this residue has been developed as a signature pattern tor these enzymes. 
[0441] Consensus pattern: [LIVM](2)-H-[NHA]-Y-G-x-[GSA](2)-x-G-x(5)-G-x-A [H is a probable active sit res.due]o- 
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[ 1] Negri A., Ceciliani F. f Tedeschi G. ( Simonic T. t Ronchi S. J. Biol. Chem. 267:11865-11871(1992). 

[ 2] Miyano M., Fukui K., Watanabe R, Takahashi S., Tada M., Kanashiro M., Miyake Y J. Biochem, 109:171-177 

(1991). 

5 [0442] 131. DEAD and DEAH box families ATP-dependent helicases signatures 

A number of eukaryotic and prokaryotic proteins have been characterized [1 ,2,3] on the basis of their structural simi- 
larity. They all seem to be involved in ATP-dependent, nucleic-acid unwinding. Proteins currently known to belong to 
this family are: - Initiation factor elF-4A. Found in eukaryotes. this protein is a subunit of a high molecular weight 
complex involved in 5'cap recognition and tho binding of mRNA lo ribosomos. It is an ATP-dopondonl RNA-holicaso. 

io - PRP5 and PRP28. These yeast proteins are involved in various ATP-requiring steps of the pre-mRNA splicing process. 

- PI10, a mouse protein expressed specifically during spermatogenesis^ - An 3, a Xenopus putative RNA helicase, 
closely related to PI10. - SPP81/DED1 and DBP1 , two yeast proteins probably involved in pre-mRNA splicing and 
related to PI10. - Caenorhabditis elegans helicase glh-1 . - MSS116, a yeast protein required for mitochondrial splicing. - 

- SPB4, a yeast protein involved in the maturation of 25S ribosomal RNA. - p68 ( a human nuclear antigen. p68 has 
'5 ATPase and DNA-helicase activities in vitro. It is involved in cell growth and division. - Rm62 (p62), a Drosophila 

putative RNA helicase related to p68. - DBP2, a yeast protein related to p68. - DHH1 , a yeast protein. - DRS1 , ayeast 
protein involved in ribosome assembly. - MAK5, a yeast protein involved in maintenance of dsRNA killer plasmid. - 
ROK1 , a yeast protein. - stel 3, a fission yeast protein. - Vasa, a Drosophila protein important for oocyte formation and 
specification of embryonic posterior structures. - Me31 B, a Drosophila maternally expressed protein of unknown f unc- 

20 tion. - dbpA, an Escherichia coli putative RNA helicase. - deaD, an Escherichia coli putative RNA helicase which can 
suppress a mutation in the rpsB gene for ribosomal protein S2. - rhIB, an Escherichia coli putative RNA helicase. - 
rhIE, an Escherichia coli putative RNA helicase. - srmB, an Escherichia coli protein that shows RNA-dependent ATPase 
activity. It probably interacts with 23S ribosomal RNA. - Caenorhabditis elegans hypothetical proteins T26G10.1, 
ZK512.2 and ZK686.2. - Yeast hypothetical protein YHR065C - Yeast hypothetical protein YHR169w. - Fission yeast 

25 hypothetical protein SpAC31 A2.07c. - Bacillus subtilis hypothetical protein yxiN. All these proteins share a number of 
conserved sequence motifs. Some of them are specific to this family while others are shared by other ATP-binding 
proteins or by proteins belonging to the helicases 'superfamily* [4.EJJ. One of these motifs, called the 'D-E-A-D-box\ 
re p resen ts a special version of the B motif of ATP-binding proteins. Some other proteins belong to a subfamily which 
have His instead of the second Asp and are thus said to be 'D-E-A-H-box' proteins [3,5,6,E1 ]. Proteins currently known 

30 to belong to this subfamily are: - PRP2, PRP16, PRP22 and PRP43. These yeast proteins are all involved in various 
ATP-requiring steps of the pre-mRNA splicing process. - Fission yeast prhl, which my be involved in pre-mRNA splicing. 

- Male-less (mle), a Drosophila protein required in males, for dosage compensation of X chromosome linked genes. - 
RAD3 from yeast. RAD 3 is a DNA helicase involved in excision repair of DNA damaged by UV light, bulky adducts or 
cross-linking agents. Fission yeast rad15 (rhp3) and mammalian DNA excision repair protein XPD (ERCC-2) are the 

55 homologs of RAD3. - Yeast CHL1 (or CTF1), which is important for chromosome transmission and normal cell cycle 
progression in G(2)/M. - Yeast TPS1. - Yeast hypothetical protein YKL078w - Caenorhabditis elegans hypothetical 
prot ins C06E1 .1 0 and K03H1 .2. - Poxviruses' early transcription (actor 70 Kd subunit which acts with RNA polymerase) 
to initiate transcription from early gene promoters. - 18, a putative vaccinia virus helicase. - hrpA, an Escherichia coli 
putative RNA helicase. Signature patterns for both subfamilies were developed. 

40 [0443] Consensus pattern: [LIVMF](2)-D-E-A-D-[RKEN]-x-[LIVMFYGSTN 
Consensus pattern: [GSAH]-x-[LIVMF](3)-D-E-[ALIV]-H-[NECR] 

Note: proteins belonging to this family also contain a copy of the ATP/GTP- binding motif *A' (P-loop) (see the relevant 
entry < PDOC00017 

4S [ 1] Schmid S.R., Under R Mol. Microbiol. 6:283-292(1992). 

[ 2] Under P., Lasko P., Ashburner M., Leroy R, Nielsen P.J., Nishi K., Schnier J., Slonimski P.P. Nature 337:121-122 
(1989). 

[ 3] Wassarman D.A., Steitz J. A. Nature 349:463-464(1991). 

[ 4] Hodgman T.C, Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata). 
so [ 5] Harosh I., Deschavanne P. Nucleic Acids Res. 19:6331-6331(1991). 

( 6] Koonin E.V, Senkevich TG. J. Gen. Virol. 73:989-993(1992). 

[0444] 132. (DHBP_synthase) 3,4-dihydroxy-2-butanone 4-phosphate synthase 

[0445] 3,4-Dihydroxy-2-butanone 4-phosphate is biosynthesized from ribulose 5-phosphate and serves as the bio- 
55 synthetic precursor for the xylene ring of riboflavin. Som times found as a bifunclional enzyme withGTP cvclohvdro2. 
[0446] Richter G, Krieger C, Volk R, Kis K, Ritz H, Gotz E, Bacher A, Methods Enzymol 1 997;280:374-382. t 
[0447] 133. (DHDPS) Dihydrodipicolinate synthetase signatures 

Dihydrodipicolinate synthetase (EC 4.2.1.52 ) (DHDPS) [1] catalyzes, in higher plants chloroplast and in many bacteria 
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(gene dapA), the lirst reaction specific to the biosynthesis of lysine and of diaminopimelate. DHDPS is responsible for 
the condensation of aspartate semialdehyde and pyruvate by aping-pong mechanism in which pyruvate first binds to 
the enzyme by forming a Schiff-base with a lysine residue: Three other proteins are structurally related to DHDPS and 
probfibly nlso net via a similar catalytic mechanism: - Escherichia coli N-acotylneuraminate lyase (EC 4.1.3.3) (gene 

£ nanA), which catalyzes tho condensation ol N-acelyl-D-mannosamine and pyruvate to form N-acetylneuraminat . - 
Rhizobium meliloti protein mosA [3], which is involved in the biosynthesis of the rhizopine 3-o-methyl-scyllo-inosamine. 
- Escherichia coli hypothetical protein yjhH. Two signature patterns for these enzymes were developed . The first one 
is centered on highly conserved region in the N-terminal part of these proteins. The second signature contains a lysine 
residue which has been shown, in Escherichia coli dapA [2], to be the one that forms a Schiff-base with the substrate. 

w [0448] Consensus pattern: |.GSA]-[LIVM]-[LIVMFY]-x(2)-G-{ST]-|TG]-G-E-[GASNF]-x(6)-[EO]- 

Consensus pattern: Y-lDNSHLIVMFA]-P-x(2)-[ST]-x(3)-[LIVMG)-x(13 ( 14HLIVM]- x-[SGA]-[LIVMF]-K-[DEQAF]- 
[STAC] [K is involved in Schiff-base formation]- 

| 1) KanekoT., Hashimoto T., Kumpaisal R., Yamada Y. J. Biol. Chem. 265:17451-17455(1990). 
is [ 2] Laber B„ Gomis-Rueth F.-X, Romao M.J., Huber R. Biochem. J. 288:691-695(1992). 

[ 3] Murphy P.J., Trenz S.P., Grzemski W., de Bruijn F.J., Schell J. J. Bacteriol. 175:5193-5204 (1993). 

[0449] 134. (DHOdehase) Dihydroorotate dehydrogenase signatures 

Dihydroorotate dehydrogenase (EC 1.3.3.1) (DHOdehase) catalyzes the fourth step in the de novo biosynthesis of 

20 pyrimidine, the conversion of dihydroorotate into orotate. DHOdehase is a ubiquitous FAD flavoprotein. In bacteria 
(gene pyrD), DHOdease is located on the inner side of the cytosolic membrane. In some yeasts, such as in Saccha- 
romycos cerevisiae (gene URA1), it is a cytosolic protein while in other eukaryotes it is found in the mitochondria [1]. 
The sequence of DHOdease is rather well conserved and two signature patterns were developed specific to this en- 
zyme. The first corresponds to a region in the N-terminal section ol the enzyme while the second is located in the C- 

25 terminal section and seems to be part of the FAD-binding domain. 

Consensus pattern[GS]-x(4)-[GK]-[GSTA]-[LIVFSTA]-[GT]-x(3)-[NOR]-x-G-[NHY]-x(2)-P-[RT] 
[0450] Consensus pattern[LIVM](2)-|GSA]-x-G-G-[IV]-x-[STGDN]-x(3)-[ACV]-x(6)-G-A 
|0461] | 1) Nngy M„ Uicroulo R, Thomas D. Proc. Natl. Acad. Sci. U.S.A. 89:8966-8970(1992). 
[0452] 135. (DMRL_synthase) 6 ( 7-dimethyl-8-ribityllumazine synthase 

30 [0453] 136. (DNA_methylase) C-5 cytosine-specific DNA methylases signatures 

C-5 cytosine-specific DNA methylases (EC 2.1.1.73 ) (C5 Mtase) are enzymes that specifically methylate the C-5 carbon 
of cytosines in DNA [1,2,3]. Such enzymes are found in the proteins described below. - As a component of type II 
restriction-modification systems in prokaryotes and some bacteriophages. Such enzymes recognize a specific DNA 
sequence where they methylate a cytosine. In doing so, they protect DNA from cleavage by type II restriction enzymes 

3S that recognize the same sequence. The sequences of a targe number of type II C-5 Mtases are known. - In vertebrates, 
there are a number of C-5 Mtases that methylate CpG dinucleotides. The sequence of the mammalian enzyme is 
known.C-5 Mtases share a number of short conserved regions. Two of them were selected. The first is centered around 
a conserved Pro-Cys dipeptide in which the cysteine has been shown [4] to be involved in the catalytic mechanism; it 
appears to form a covalent intermediate with the C6 position of cytosine. The second region is located at the C-lerminal 

to extremity in type-ll enzymes 

[0454] Consensus pattern: |DENKS]-x-[FLIV]-x(2)-[GSTC]-x-P-C-x(2)-[FYWLlM]-S [C is the active site residue]- 
Consensus pattern: [RKQGTF]-x(2)-G-N-[STAG]-[LIVMF]-x(3)-[LIVMT]-x(3)-[LIVM]-x(3)-[LIVM]- 

[ 1] Posfai J., Bhagwat A.S., Roberts R.J. Gene 74:261-263(1988). 
45 [ 2] Kumar S., Cheng X., Klimasauskas S., Mi S., Posfai J., Roberts R.J., Wilson G.G. Nucleic Acids Res. 22:1-10 

(1994). 

| 3] Lauster R., Trautner T.A., Noyer-Weidner M. J. Mol. Biol. 206:305-312(1989). 

[ 4] Chen L., McMillan A.M., Chang W., Ezak-Nipkay K., Lane W.S., Verdine G.L Biochemistry 30:11018-11025 
(1991). 

so 

[Q455] 1 37. (DNAphotolyase) DNA photolyases class 2 signatures 

Deoxyribodipyrimidine photolyase (EC 4.1.99.3 ) (DNA photolyase) [1,2] is a DNArepair enzyme. It binds to UV-dam- 
aged DNA containing pyrimidine dimers and, upon absorbing a near-UV photon (300 to 500 nm), breaks the cyclobutane 
ring joining the two pyrimidines of the dimer. DNA photolyase is an enzyme that requires two choromophore-cofactors 
55 for its activity: a reduced FADH2 and either 5,10-methenyltetrahydrofotate (5,10-MTFH) or an oxidized 8-hydroxy- 
5-deazaflavin (8-HDF) derivative (F420). The folate or deazaflavin chromophore appears to function as an antenna, 
while the FADH2 chromophore is thought to be responsible for electron transfer. On the basis of sequence similarities 
[3] DNA photolyases can be grouped into two classes. The second class contains enzymes from Myxococcus xanthus, 
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methanogenic archaebacterta, insects, fish and marsupial mammals. It is not yet known what second cofactor is bound 
to class 2 enzymes. There are a number of conserved sequence regions in all known class 2 DNAphotolyas s, espe- 
cially in the C-terminal part. Two of these regions were selected as signature patterns. 
Cons nsus pattern: F-x-E-E-x-[LIVM)(2)-R-R-E-L-x(2)-N-F- 
s Consensus pattern: G-x-H-D-x(2)-W-x-E-R-x-[LIVM]-F-G-K-[LIVM]-R-[FY]-M-N- 

[ 1) Sancar G.B., Sancar A. Trends Biochem. Sci. 12:259-261(1987). 
[ 2] Jorns M.S. Biotactors 2:207-211(1990). 

[ 3] Yasui A., Eker A.P.M., Yasuhira S. t Yajima H. t Kobayashi T., Takao M., Oikawa A. EMBO J. 13:6143-6151 
10 (1994). 

[0456] (DNAphotolyase2) DNA photolyases class 1 signatures 

Deoxyribodipyrimidine photolyase (EC 4.1.99.3 ) (DNA photolyase) [1,2] is a DNA repair enzyme. It binds to UV-dam- 
aged DNA containing pyrimidine dimers and .upon absorbing a near-UV photon (300 to 500 nm), breaks the cyclobu- 

*5 tan ring joining the two pyrimidines of the dimer. DNA photolyase is an enzyme that requires two choromophore- 
cofactors for its activity: a reduced FADH2 and either 5,10-methenyltetrahydrofolate (5,10-MTFH) or an oxidized 8-hy- 
droxy-5-deazaflavin (8-HDF) derivative (F420). The folate or deazaflavin chromophore appears to function as an an- 
tenna, while the FADH2 chromophore is thought to be responsible for electron transfer. On the basis of sequence 
similarities[3] DNA photolyases can be grouped into two classes. The first class contains enzymes from Gram-negative 

20 and Gram-positivo bactoria, tho halophilic archaobactoria Hatobactorium haloblum, fungi and plants. Class 1 onzymo6 
bind either 5,10-MTHF (E.coli, fungi, etc.) or 8-HDF (S.griseus, H.halobium),This family also includes Arabidopsis 
cryptochromes 1 (CRY1) and 2 (CRY2),which are blue light photoreceptors that mediate blue light-induced gene ex- 
pression. There are a number of conserved sequence regions in all known class 1 DNA photolyases, especially in the 
C-terminal part. Two of these regions were selected as signature patterns 

zs [0457] Consensus pattern: T-G-x-P-[LlVM](2)-D-A-x-M-[RA]-x-[LIVM]- 

Consensus pattern: IDN]-R-x-R-[LIVM](2)-x-|STA](2)-F-[LIVMFA]-x-K-x-L-x(2,3)- W-[KRQ]- 

[ 1] Sancar G.B., Sancar A. Trends Biochem. Sci. 12:259-261(1987). 
[ 2] Jorns M.S. Biofactors 2:207-211(1990). 
30 [ 3] Yasui A., Eker A.P.M., Yasuhira S,, Yajima H., Kobayashi T.. Takao M., Oikawa A. EMBO J. 13:6143-6151 

(1994). 

[ 4] Lin C. Ahmad M., Cashmore A.R. Plant J. 10:893-902(1996). 


[0458] 138. (DNA_pol_A) 
35 DNA polymerase family A signature 

Replicative DNA polymerases (EC 2.7.7.7) are the key enzymes catalyzing the accurate replication of DNA. They 
require either a small RNA molecule or a prolein as a primer for the do novo synthesis of a DNA chain. On tho basis 
of sequence similarities a number of DNA polymerases have been grouped together [1 ,2,3] under the designation of 
DNA polymerase family A. The polymerases that belong to this family are listed below. 


40 


45 


so 


Escherichia coli and various other bacterial polymerase I (gene polA). 

Thermus aquaticus Taq polymerase. 

Bacteriophage sp01 polymerase. 

Bacteriophage sp02 polymerase. 

Bacteriophage T5 polymerase. 

Bacteriophage T7 polymerase. r 

Mycobacteriophage L5 polymerase. 

Yeast mitochondrial polymerase gamma (gene MIP1). 


0459] Five regions of similarity are found in all the above polymerases. One of these conserved regions, known as 
'motif B' [1], is located in a domain which, in Escherichia coli polA, has been shown to bind deoxynucleotide triphosphate 
substrates; it coniains a conserved tyrosine which has been shown, by photo- affinity labelling, to be in the active site; 
a conserved lysine, also part of this motif, can be chemically labelled, using pyridoxal phosphate. This conserved region 
was used as a signature for this family of DNA polymerases. 
55 [0460] Consensus patlernR-x(2)-[GSAV]-K-x(3)-[LIVMFY]-[AGO]-x(2)-Y-x(2)-[GS]-x(3)-[LIVMA] Sequences known 
to belong to this class detected by the pattern ALL. t 

[ 1] Delarue M., Poch O., Todro N. f Moras D., Argos P. Protein Eng. 3:461-467(1990). 
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[ 2] Ito J., Braithwaite D.K. Nucleic Acids Res. 19:4045-4057(1991). 
.[ 3] Braithwaite D.K., Ito J. Nucleic Acids Res. 21 :787-802(1 993). 

I 

|0461] 139. DNA„poLviraLC , ( 

DNA polyinoinno (viiril) C-toiminnl domain 
Number of members: 128 
[0462] 140. (DNAJopoisoll) 
DNA topoisomerase II signature 

DNA topoisomerase I (EC 5.99.1.2) [1,2,3 f 4,E1] is one of the two types of enzyme that catalyze the mterconversion 
to of lopolonicnl DNA isomers. Typo II topoisomorasos are ATP-depondont and act by passing a DNA segment through 
a transient doublo-strand broak. Topoisomoraso II Is found in phages, archaobacloria, prokaryotes, eukaryoles, and 
in Alrican Swine Fever virus (ASF). In bacteriophage T4 topoisomerase II consists of three subunits (the product of 
genos 39, 52 and 60). In prokaryotes and in archaebacteria the enzyme, known as DNA gyrase, consists of two subunits 
(genes gyrA and gyrB [E2]). In some bacteria, a second type II topoisomerase has been identified; it is known as 
is topoisomerase IV and is required for chromosome segregation, it also consists of two subunits (genes parC and parE). 
In onkniynlrm, lypo II lopoinornomno In t\ homodimor, 

10463] There are many regions ol soquonco homology bolwoon Iho different oubtypoo of lopoloomoruso II. Ino 
relation between the different subunits is shown in the following representation: 


20 


25 


30 


-About- 1 400-residues- 


[ Protein 39-* ][— -Protein 52— ] Phage T4 

[.— -gyrB *-"--]( gyrA — ] Prokaryote II 

Archaebacteria 

[.-- parE * ][ parD— - ] Prokaryote IV 

j _ _* ] Eukaryote and 

ASF 

35 Position of the pattern. 

[0464] As a signature pattern for this family of proteins, a region that contains a highly conserved pentapeptide was 
soloctod. The pattern is located in gyrB, in parE, and in protein 39 of phage T4 topoisomerase. 
40 [0465] Consensus pattern[LIVMA]-x-E-G-IDN]-S-A-x-[STAG] Sequences known to belong to this class detected by 
the pattern ALL. 

[ 1] Sternglanz R. Curr. Opin. Cell Biol. 1:533-535(1990). 
[ 2] Bjornsti M.-A. Curr. Opin. Struct. Biol. 1:99-103(1991). 
a& [ 3] Sharma A,, Mondragon A. Curr. Opin. Struct. Biol. 5:39-47(1995). 

[ 4] Roca J. Trends Biochem. Sci. 20:156-160(1995). 

[0466] 141 (DSPc) Tyrosine specific protein phosphatases signature and profiles 

Tyrosine specific protein phosphatases (EC 3.1.3.48 ) (PTPase) [1 to 5] are enzymes that catalyze the removal of a 
so phosphate group attached to a tyrosine residue. These enzymes are very important in the control of cell growth, pro- 
liferation, differentiation and transformation. Multiple forms of PTPase have been characterized and can be classified 
into two categories- soluble PTPases and transmembrane receptor proteins that contain PTPase domain(s). The cur- 
rently known PTPases are listed below: Soluble PTPases. - PTPN1 (PTP-1B). - PTPN2 (T~cell PTPase; TC-PTP). - 
PTPN3 (H1) and PTPN4 (MEG), enzymes that contain an N-terminal band 4.1- like domain (see <PDOC00566>) and 
55 could act at junctions between the membrane and cytoskeleton. - PTPN5 (STEP). - PTPN6 (PTP-1C; HCP; SHP) and 
PTPN11 (PJP-2C- SH-PTP3; Syp), enzymes which contain two copies of the SH2 domain at its N-termmal extremity. 
The Drosophila protein corkscrew (gene csw) also belongs to this subgroup. - PTPN7 (LC-PTP; Hematopoietic protein- 
tyrosine phosphatase; HePTP). - PTPN8 (70Z-PEP). - PTPN9 (MEG2). - PTPN12 (PTP-G1; PTP-P19). - Yeast PTP1. 
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- Yeasl PTP2 which may be involved in the ubiquitin-medialed protein degradation pathway. - Fission yeast pypl and 
pyp2 which play a role in inhibiting the onset of mitosis. - Fission yeast pyp3 which contributes to the dephosphorylation 
of cdc2.-Yeast CDC14 which may be involved in chromosome segregation. - Yersinia virulence plasmid PTPAses (gene 
yopH). - Autographa californica nuclear polyhidrosis virus 1 9 Kd PTPase.Dual specificity PTPases. - DUSP1 (PTPN10; 
MAP kinase phosphatase-1 ; MKP-1); which dephosphorylates MAP kinase on both Thr-183 and Tyr-185. - DUSP2 
(PAC-1), a nuclear enzyme that dephosphorylates MAP kinases ERK1 and ERK2 on both Thr and Tyr residues. - 
DUSP3 (VHR). - DUSP4 (HVH2). - DUSP5 (HVH3). - DUSP6 (Pystl ; MKP-3). - DUSP7 (Pyst2; MKP-X). - Yeast MSG5, 
a PTPase that dephosphorylates MAP kinase FUS3. - Yeast YVH1.. - Vaccinia virus H1 PTPase; a dual specificity 
phosphatase. Receptor PTPases. Structurally, all known receptor PTPases, are made up of a variable length extra- 
cellular domain, followed by a transmembrane region and a C-terminalcatalytic cytoplasmic domain. Some of the re- 
ceptor PTPases contain fibronectintype III (FN-III) repeats, immunoglobulin-like domains, MAM domains orcarbonic 
anhydrase-like domains in their extracellular region. The cytoplasmic region generally contains two copies of the PT- 
PAse domain. The first seems to have enzymatic activity, while the second is inactive but seems to affect substrate 
specificity of the first. In these domains, the catalytic cysteine is generally conserved but some other, presumably 
important, residues are not. In the following table, the domain structure of known receptor PTPases is shown: Extra- 
cellular Intracellular Ig FN-3 CAH MAM PTPaseLeukocyte common antigen (LCA) (CD45) 0 

2 0 0 2Leukocyte antigen related (LAR) 3 8 0 0 2 Drosophila DLAR 3 9 0 0 2Drosophila DPTP 2 2 0 0 2PTP-alpha 
(LRP) 0 0 0 0 2PTP-beta 0 16 0 0 1PTP-gamma 0 110 2PTP-delta 0 >7 0 0 2 PTP-epsilon 0 0 0 0 2PTP-kappa 1 4 
0 1 2PTP-mu 14 0 1 2PTP-zeta 0 110 2PTPase domains consist of about 300 amino acids. There are two conserved 
cysteines, the second one has been shown to be absolutely required for activity. Furthermore, a number of conserved 
residues in its immediate vicinity have also been shown to be important. A signature pattern for PTPase domains was 
derived centered on the active site cysteine. There are three profiles for PTPases, the first one spans the complete 
domain and is not specific to any subtype. The second profile is specific to dual-specificity PTPases and the third one 
to the PTP subfamily 

[0467] Consensus pattern: [LIVMF]-H-C-x(2)-G-x(3)-[STC]-ISTAGP]-x-[LIVMFY] [C is the active site residue]- 

[ 1] Fischer E.H., Charbonneau H., Tonks N.K. Science 253:401-406(1991). 
[ 2] Charbonneau H., Tonks N.K. Annu. Rev. Cell Biol. 8:463-493(1992). 
[ 3) Trowbridge I.S. J. Biol. Chem. 266:23517-23520(1991). 
[ 4] Tonks N.K., Charbonneau H. Trends Biochem. Sci. 14:497-500(1989). 
[ 5] Hunter T. Cell 58:1013-1016(1989). 

[0468] 142. (DUF10) Uncharacterized protein family UPF0076 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Goat antigen UK114, a 
human homolog and the rat corresponding protein which is known as perchloric acid soluble protein (PSP1). PSP1 [2] 
may inhibit an initiation stage of cell-free protein synthesis. - Mouse heat-responsive protein HRSP12. - Yeast chro- 
mosome V hypothetical protein YER057c. - Yeast chromosome IX hypothetical protein YIL051C - Caenorhabditis el- 
egans hypothetical protein C23G10.2. - Escherichia coli hypothetical protein ycdK. - Escherichia coli hypothetical pro- 
tein yhaR. - Escherichia coli hypothetical protein yjgF and HI0719, the corresponding Haemophilus influenzae protein. 

- Escherichia coli hypothetical protein yoaB. - Bacillus subtilis hypothetical protein yabJ. - Haemophilus influenzae 
hypothetical protein HI1627. - Helicobacter pylori hypothetical protein HP0944. - Lactococcus laclisaldR. - Myxococcus 
xanthus dfrA. - Synechocystis strain PCC 6803 hypothetical protein slr0709. - Rhizobium strain NGR234 symbiotic 
plasmid hypothetical protein y4sK. - Pyrococcus horikoshii hypothetical protein PH0854.These are small proteins of 
around 15 Kd whose sequence is highly conserved. As a signature pattern, a well consorvod region locatod in the C- 
terminal pari of these proteins was selected. 

[0469] Consensus pattern: [PA]-[ASTPV]-R-[SACVF]-x-[LIVMFY]-x(2)-[GSAKR]-x-[LMVA]-x(5,8)-[LIVM]-E-[MI]- 
[ 1] Bairoch A, Unpublished observations (1995). 

[ 2]OkaT., TsujiK, NodaC, Sakai K., Hong Y.-M., Suzuki I., Munoz S, ( Natorl Y. J. Biol. Chom. 270:30060-30067 

[0470] 143. (DUF3)Domain of Unknown Function 3 

Domain apparently occurring exclusively in eubacteria. Unknown function. 

[0471] 144. (DUF6) Integral membrane protein 

[0472] This family Includos many hypothetical mombrHno proloins ol unknown luncllon. Muny of Iho prololno conlnln 
two copies of th aligned region. \ 
[0473] 145. (DUF7) Integral membrane protein 

[0474] This family includes many hypothetical membrane proteins of unknown function. Swiss: P1 4502 has been 
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implicated in resistance to ethidium bromide. 

[0475] 146. (DapB) Dihydrodipicolinate reductase signature 

Dihydrodipicolinate reductase (EC 1.3.1.26 ) catalyzes the second step in the biosynthesis of diaminopimelic acid and 
lycino, tho NAD or NADP-dopondont roduction of 2,3-dihydrodipicolinale into 2,3,4,5-tetrahydrodipicolinate. This en- 
fi zymo'iG proGont in bacloria (gone dapB) and highor plants. As a signature pattern tho best cons rved region in this 
enzyme was selected. It is located in the central section and is part of the substrate-binding region [1]. 
[0476] Consensus pattorn: E-[IV]-x-E-x-H-x(3)-K-x-D-x-P-S-G-T-A- 
[0477] [ 1] Scapin G., Blanchard J.S., Sacchettini J.C. Biochemistry 34:3502-3512(1995). 
[0478] 147. DedA family 

to [0479] This lamily combines tho DodA rolaled protoins and YIAN/YGIK family. Members ot this family are not func- 
tionally characterised. These proteins contain multiple predicted transmembrane regions. 
[0480] 148. DegT/DnrJ/EryC1/StrS family 

[0481] The members of this family exhibit some characteristics of the sensor protein of two-component signal trans- 
duction systems, however none of the members show any sequence similarity to these protein kinases. The members 
is of this family do have the typical helix-turn-helix motif of DNA binding proteins. 

[0482] |1] Stut/mnn-EnflWftill KJ, Olton SL, Hutchinson CR, J Bactoriol 1992;174:144-154. 
[0483] 149. (Desaturase) Fatty acid desaturases signatures 

Fatty acid desaturases (EC 1.14.99.-) are enzymes that catalyze the insertion of a double bond at the delta position 
of tatty acids. Thoro seoms to be two distinct families ot fatty acid desaturases which do not seem to be evolutionary 

20 related. Family 1 is composed of: - Stearoyl-CoA desaturase (SCD) (EC 1.14.99.5 ) [1]. SCD is a key regulatory enzyme 
ot unsaturated fatty acid biosynthesis. SCD introduces a cis double bond at the delta(9) position of fatty acyl-CoA's 
ouch tta palmliolooyl- and olooyl-CoA. SCD is a mombrano-bound onzyme that is thought to function as a part of a 
multienzyme complex in the endoplasmic reticulum of vertebrates and lungi. As a signature pattern tor this family a 
conserved region in the C-terminal part of these enzymes was selected, this region is rich in histidine residues and in 

25 aromatic residues. Family 2 is composed of: - Plants slearoyl-acyl-carrier-protein desaturase (EC 1.14.99.6) [2], these 
enzymes catalyze the introduction of a double bond at the delta(9) position of steraoyl-ACP to produce oleoyl-ACP 
This enzyme is responsible for the conversion of saturated fatty acids to unsaturated fatty acids in the synthesis of 
vuuulfiblu olio. - Cynnobwcloiln dooA |3] nn onzymo that can introduco a socond cis double bond at the delta(12) 
position of fatty acid bound to membranes glycerolipids. DesA is involved in chilling tolerance; the phase transition 

30 temperature of lipids of cellular membranes being dependent on the degree of unsaturation of fatty acids of the mem- 
brane lipids. As a signature pattern for this family a conserved region in the C-terminal part of these enzymes was 
selected 

[0484] Consensus pattern: G-E-x-[FY]-H-N-[FY]-H-H-x-F-P-x-D-Y- 

Consensus pattern: |STHSA]-x(3)-[QRHLI]-x(5,6)-D-Y-x(2)-[LIVMFYWHLIVM]-[DE]- 

35 

[ 1] Kaestner K.H., Ntambi J.M., Kelly T.J. Jr., Lane M.D. J. Biol. Chem. 264:14755-14761(1989). 
[ 2] Shanklin J., Somerville CR. Proc. Natl Acad. Sci. U.S.A. 88:2510-2514(1991). 
[ 3] Wada K, Gombos Z., Murata N. Nature 347:200-203(1990). 

40 [0485] 150. Dihydroorotase signatures 

Dihydroorotase (EC 3.5.2.3 ) (DHOase) catalyzes the third step in the de novo biosynthesis of pyrimidine, the conversion 
of ureidosuccinic acid (N-carbamoyl-L-aspartate) intodihydroorotale. Dihydroorotase binds a zinc ion which is required 
for its catalytic activity [1]. In bacteria, DHOase is a dimer of identical chains ot about 400 amino-acid residues (gene 
pyrC). In higher eukaryotes, DHOase is part of a large multi-lunctional protein known as 'rudimentary' in Drosophila 

4S and CAD in mammals and which catalyzes the first three steps of pyrimidine biosynthesis [2]. The DHOase domain is 
located in the central part of this polyprotein. In yeasts, DHOase is encoded by a monofunctional protein (gene URA4). 
However, a defective DHOase domain [3] is found in a multifunctional protein (gene URA2)that catalyzes the first two 
steps of pyrimidine biosynthesis. The comparison of DHOase sequences from various sources shows [4] that there 
are two highly conserved regions. The first located in the N-terminal extremity contains two histidine residues suggested 

so [3] to be involved in binding the zinc ion. The second is found in the C-terminal part. Signature patterns for both regions 
have been developed. Allantoinase (EC 3.5.2.5 ) is the enzyme that hydrolyzes allantoin intoallantoate. In yeast (gene 
DAL1 ) [5], it is the first enzyme in the allanto indegradation pathway; in amphibians [6] and fish it catalyzes the second 
step in the degradation of uric acid. The sequence of allantoinase is evolutionary related to that of DHOases. 
[0486] Consensus pattern: D-[LIVMFYWSAP]-H-[LIVA]-H-[LIVF]-[RN]-x-[PGANF] [The two H's are probable zinc 

55 ligands]- 

Consensus pattern: [GA]-[ST]-D-x-A-P-H-x(4)-K- 

[ 1] Brown D.C., Collins K.D. J. Biol. Chem. 266:1597-1604(1991). 
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[ 2] Davidson J.N., Chen K.C., Jamison R.S., Musmanno LA., Kern C.B. BioEssays 15:157-164(1993). 
| 3] Souciet J.-L, Nagy M., Le Gouar M. ( Lacroute R t Potier S. Gene 79:59-70(1989). 
[ 4] Guyonvarch A., Nguyen-Juilleret M., Hubert J.-C, Lacroute F. MoL Gen. Genet. 212:134-141(1988). 
[5] Buckholz R.G., Cooper T.G. Yeast 7:913-923(1991). 
5 [ 6] Hayashi S.. Jain S., Chu R, Alvares K., Xu B., Erturth F., Usuda M., Rao M.S., Reddy S.K., Noguchi T., Reddy 

J.K., Yeldandi A.Y J. Biol. Chem. 269:12269-12276(1994). 

[0487] 151. dnaJ domains signatures and profile 

[0488] The prokaryotic heat shock protein dnaJ interacts with thechaperone hsp70-like dnaK protein [1]. Structurally, 
io the dnaJ protein consists ot an N- terminal conserved domain (called 'J 1 domain) ot about 70 amino acids, a g'lycine- 
rich region ('G' domain') ot about 30 residues, a central domain containing tour repeats ot a CXXCXGXG motif ('CRR' 
domain) and a C-terminal region of 120 to 170 residues. Such a structure is shown in the following schematic repre- 
sentation: 


is 


20 


+ +-+ -+ + + + | N-terminai 1 1 

Gly-R 1 1 CXXCXGXG | C-terminal | + +-+ +—-+ + 

— + 

[0489] It has been shown [2] that the 'J' domain as well as the 'CRR* domain are also found in other prokaryotic and 
eukaryotic proteins which are listed below. 


a) Proteins containing both a 'J' and a 'CRR' domain: 

25 

Yeast protein MAS5/YDJ1 which seems to be involved in mitochondrial protein import. 
Yeast protein MDJ1 , involved in mitochondrial biogenesis and protein folding. 
Yeast protein SCJ1 , involved in protein sorting. 
Yeast protein XDJ1. 
30 - Plants dnaJ homologs (from leek and cucumber). 

Human HDJ2, a dnaJ homolog of unknown function. 
Yeast hypothetical protein YNL077w. 

a) Proteins containing a 'J'domain without a 'CRR' domain: 

35 

Rhizobium fredii nolC, a protein involved in cultivar-specific nodulation of soybean. 
Escherichia coli cbpA [3], a protein that binds curved DNA. 

Yeast protein SEC63/NPL1 , important for protein assembly into the endoplasmic reticulum and the nucleus. 
Yeast protein SIS1, required for nuclear migration during mitosis. 
^0 - Yeast protein CAJ1. 

Yeast hypothetical protein YFR041C. 
Yeast hypothetical protein YIR004w. 
Yeast hypothetical protein YJL162c. 

Plasmodium falciparum ring-infected erythrocyte surface antigen (RESA). RESA, whose function is not known, 
45 js associated with the membrane skeleton of newly invaded erythrocytes. 

Human HDJ1. 

Human HSJ1, a neuronal protein. 
Drosophila cysteine-string protein (csp). 

so [0490] A signature pattern for the 'J' domain was developed, based on conserved positions in the C-terminal half of 
this domain. A pattern for the 'CRR' domain, based on the first two copies of that motif was also developed. A profile 
for the 'J' domain was also developed. 

[0491] Consensus pattern: [FY]-x(2)-[LIVMA]-x(3HFYWHNT]-[DENQSA]-x-L-x-[DN]-x(3)-[KR]-x(2)-[FYI]- 
Consensus pattern: C- [DEGSTHKR]-x-C-x-G-x-IGK]-[AGSDM]-x(2)-(GSNKR]-x(4,6)-C-x(2 f 3)-C-x-G-x-G- 

55 

[1] Cyr D.M., LangerT., Douglas M.G. Trends Biochem. ScL 19:176-181(1994). i 

[2] Bork R, Sander C, Valencia A., Bukau B. Trends Biochem. ScL 17:129-129(1992). 

[3] Ueguchi C, Kaneda M. ( Yamada H., Mizuno T Proc. Natl. Acad. Sci. U.S.A. 91:1054-1058(1994). 
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[0492] 152. 
[0493] 153. Dwarf in 

[0494] TThis family known as the dwarfins also includes the drosophila protein MAD. The N-terminus of MAD can 
bind to DNA|2|. 

5 [0495] [1] Yingling JM, Das R Savage C. Zhang M, Padgett RW, Wang XF, Proc Natl Acad Sci USA 1996;93: 
8940-8944. [2] Kim J, Johnson K, Chen HJ, Carroll S, Laughon A, Nature 1997;388:304-308. 
[0496] 154. Dynein light chain type 1 signature 

Dynein is a muttisubunit microtubule-dependent motor enzyme that. acts as the force generating protein of eukaryotic 
cilia and f lagella. The cytoplasmic isoform of dynein acts as a motor for the intracellular retrograde motility of vesicles 

w and organelles along microtubules. Dynein is composed of a number of ATP-binding large subunits, intermediate size 
subunits and small subunits. Among the small subunits, there is a family [1 ,2] of highly conserved proteins which consist 
of: • Chlamydomonas reinhardtii flagellar outer arm dynein 8 Kd and 11 Kd light chains. - Higher eukaryotes cytoplasmic 
dynein light chain 1. -Yeast cytoplasmic dynein light chain 1 (gene DYN2orSLC1). - Caenorhabditis elegans hypothet- 
ical dynein light chains M1 8.2 and T26A5.9.These proteins are have from 89 to 1 20 amino acids. As a signature pattern, 

15 a highly conserved region was selected. 

Consensus pattern: H-x-l-x-G-(KR]-x-F-|GA]-S-x-V-[ST]-[HY]-E - ... 

| 1] King S.M., Patel-King R.S. J. Biol. Chem. 270:11445-11452(1995). 

[ 2] Dick T, Ray K., Salz H.K., Chia W. Mol. Cell. Biol. 16:1966-1977(1996). 

20 

[0497] 155. dUTPase 

[0498] dUTPase hydrolyzes dUTP to dUMP and pyrophosphate. 

[0499] |1] Cedergren-Zeppezauer ES, Larsson G, Nyman PO, Dauter Z, Wilson KS, Nature 1992;355:740-743. [2] 
Mol CD, Harris JM, Mcintosh EM, Tainer JA, Structure 1996;4:1077-1092. 

25 [0500] 156. (dCMP cyt deam) Cytidine and deoxycytidylate deaminases zinc-binding region signature 

Cytidino deaminase (EC 3.5.4.5 ) (cytidine aminohydrolase) catalyzes the hydrolysis of cytidine into uridine and am- 
monia whilo dooxycylidylatodoaminase (EC 3.5.4.12) (dCMP deaminase) hydrolyzes dCMP inlodUMR Both enzymes 
are known to bind zinc and to require it for their catalytic activity [1 ,2]. These two enzymes do not share any sequence 
similarity with the exception of a region that contains three conserved histidine and cysteine residues which are thought 

30 to be involved in the binding of the catalytic zincion. Such a region is also lound in other proteins [3,4]: - Yeast cytosine 
deaminase (EC 3.5.4.1 ) (gene FCY1) which transforms cytosine into uracil. - Mammalian apolipoprotein B mRNA 
editing protein, responsible for the postranscriptional editing of a CAA codon into a UAA (stop) codon in the APOB 
mRNA. - Riboflavin biosynthesis protein ribG, which converts 2,5-diamino-6- (ribosylamino)-4(3H)-pyrimidinone 5'- 
phosphate into 5-amino-6-(ribosylamino)-2,4(1 H,3H)-pyrimidinedione 5'-phosphate. - Bacillus cereus blasticidin-S 

35 deaminase (EC 3.5.4.23 ), which catalyzes the deamination of the cytosine moiety of the antibiotics blasticidin S, cy- 
tomycin and acetylblasticidin S. - Bacillus subtilis protein comEB. This protein is required for the binding and uptake 
ot transforming DNA. - Bacillus subtilis hypothetical protein yaaJ. - Escherichia colt hypothetical protein yfhC. - Yeast 
hypothetical protoin YJL035c. A signature pattern for this zinc-binding region was derived. 

[0501] Consensus pattern: [CH]-[AGV]-E-x(2)-fLIVMFGAT]-[LIVM]-x(1 7,33)-P-C-x(2,8)-C-x(3)-|LIVM] [The C's and 
h are zinc ligands 

[ 1] Yang C, Carlow D., Wollenden R., Short S.A. Biochemistry 31:4168-4174(1992). 
[ 2] Moore J.T., Silversmith R.E., Maley G.F., Maley F. J. Biol. Chem. 268:2288-2291(1993). 
[ 31 Roizor J., Buskirk S. ( Bairoch A., Reizer A., Saier M.H. Jr. Protein Sci. 3:853-856(1994). 
45 [ 4] Bhattacharya S., Navaratnam N., Morrison J. R., Scott J., Taylow W.R. Trends Biochem. Sci. 19:105-106(1994). 

[0502] 157. Dehydrins signatures 

A number of proteins are produced by plants that experience water-stress. Water-stress takes place when the water 
available to a plant falls below a critical level. The plant hormone abscisic acid (ABA) appears to modulate the response 
so of plant to water-stress. Proteins that are expressed during water-stress are called dehydrins [1,2] or LEA group 2 
proteins [3]. The proteins that belong to this family are listed below. 

- - Arabidopsis thaliana XERO 1, XERO 2 (LTI30), RAB18, ERD10 (LTI45) ERD14 and COR47. 

Barley dehydrins B8, B9, B17, and B18. 
55 - Cotton LEA protein D-11. 

Craterostigma plantagineum dessication-related proteins A and B. 
Maize dehydrin M3 (RAB-17). 

- Pea dehydrins DHN1 , DHN2, and DHN3. 
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Radish LEA protein. 

Rice proteins RAB 16B, 16C, 16D, RAB21, and RAB25. 
Tomato TAS 14. 

Wheat dehydrin RAB 15 and cold-shock protein cor410, cs66 and cs1 20. 

5 

[0503] Dehydrins share a number of structural features. One of the most notable features is the presence, in their 
central region, of a continuous run of five to nine serines followed by a cluster of charged residues. Such a region has 
been found in all known dehydrins so far with the exception of pea. dehydrins. A second conserved feature is the 
presence of two copies of alysine-rich octapeptide; the first copy is located just after the cluster of charged residues 
'0 that follows the poly-serine region and the second copy is found at the C-terminal extremity. Signature patterns for 
both regions were derived. 

[0504] Consensus pattern: S(5)-[DE]-x-[DE]-G-x(1,2)-G-x(0,1)-[KR](4 
Consensus pattern: [KRHLIM]-K-[DE]-K-[LIM]-P-G- 

« [1] Close T.J., Kortt A.A., Chandler P.M. Plant MoL Biol. 13:95-108(1989). 

[2] Robertson M., Chandler P.M. Plant MoL Biol. 19:1031-1044(1992). 

[3] Dure L III, Crouch M., Harada J M Ho T.-H. D., Mundy J., Ouatrano R., Thomas T., Sung Z.R. Plant Mol. Biol. 
12:475-486(1989). 

20 [0505] 158. (deoR) Bacterial regulatory proteins, deoR family signature 

The many bacterial transcription regulation protoins which bind DNA through a holix-turn-holix 1 motif can bo classified 
into subfamilies on the basis of sequence similarities. One of these subfamilies groups the following proteins|1 ,2]: - 
accR, the Agrobacterium tumefaciens plasmid pTiC58 repressor ol opine catabolism and conjugal transler. - agaR, 
the Escherichia coli aga operon putative repressor - deoR, the Escherichia colt deoxyribose operon repressor. - fucR, 

25 the Escherichia coli L-fucose operon activator. - gatR, the Escherichia coli galactitol operon repressor. - glpR, the 
Escherichia coli glycerol-3-phosphate regulon repressor. - gutR (or srIR), the Escherichia coli glucitol operon repressor. 

- ioIR, from Bacillus subtilis. - lacR, the streptococci lactose phosphotransferase system repressor. - spolllD, the Bacillus 
subtilis transcription regulator of the sigK gene. - yfjR, an Escherichia coli hypothetical protein. - ygbl, an Escherichia 
coli hypothetical protein. - yihW, an Escherichia coli hypothetical protein. - yjfQ, an Escherichia coli hypothetical protein. 

30 - yjhJ, an Escherichia coli hypothetical protein. The 'helix-turn-helix' DNA-binding motif of these proteins is located in 
the N-terminal part of the sequence. The pattern used to detect these proteins starts fourteen residues before the HTH 
motif and ends one residue after it. 

[0506] Consensus pattern: R-x(3)-[LIVM]-x(3)-[LIVM]-x(16,17)-[STA]-x(2)-T-[LIVMA]- [RH]-[KRNA]-D-[LIVMF]- 

35 [ 1] von Bodman S., Hayman G.T., Farrand S.K. Proc. Natl, Acad. Sci. U.S.A. 89:643-647(1992). 

[ 2] Bairoch A. Unpublished observations (1993). 

[0507] 1 59. dsrm 
Double-stranded RNA binding motif 
40 [1] BurdCG, Dreyf uss G; Medline: 94310455, Conserved structures and diversity of functions of RNA-binding proteins. 
Science 1994;265:615-621. 

[0508] Sequences gathered for seed by HMMJterative Jraining Putative motif shared by protoins that bind to dsRNA. 
At least some DSRM proteins seem to bind to specific RNA targets. Exemplified by Stauten, which is involved in 
localization of at least five different mRNAs in the early Drosophila embryo. Also by interf eron-induced protein kinase 
45 in humans, which is part of the cellular response to dsRNA. 
[0509] Number of members: 116 
[0510] 160. Dynamin family signature 

Dynamin [1,2] is a microtubule-associated force-producing protein of 100 Kd which is involved in the production of 
microtubule bundles and which is able to bind and hydrolyze GTP Dynamin is structurally related to the following 
so proteins: ■ Drosophila shibiro protoin (gono shi) [3]. Shibiro io, vory probnbly, tho Dronophlln cognnto of mnrnm/ilinn 
dynamin. It seems to provide the motor for vesicular transport during endocytosis. - Yeast vacuolar sorting protein 
VPS1 (or SPOTS) [4], a protein which could also be involved in microtubule-associated motility. - Yeast protein MGM1 
[5], which is required for mitochondrial genome maintenance. - Yeast protein DNM1, which is involved in endocytosis. 

- Interferon induced Mx proteins [6,7]. Interferon alpha or beta induce the synthesis of a family of closely related proteins. 
55 Most of these proteins are known to confer resistance to influenza viruses and/or rhabdoviruses on transfected mam- 
malian coll In culluro. Tho throo motllo found In till OTP-binding piololno tiro locnlod In Iho N-lomilntil pml ol ll|«tao 
proteins. The signature pattern that was developed for these proteins is based on a highly conserved region downstream 
of the ATP/GTP-binding motif 'A (P-loop) (see <PDOC00017>),- 
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|0511] Consensus pattern: L-P-|RK]-G-|STN]-[GN]-[LIVMJ-V-T-R- 

t 

[ 1] Vallee R.B., Shpetner H.S. Annu. Rev. Biochem. 59:909-932(1990). 

| 2) Obar R.A., Colllno C.A., Hammarback J.A., Shpotnor H.S., Valloo R.B. Nature 347:256-261(1990). 
s [ 3] van der Bliek A., Meyerowitz E.M. Natur 351:411-414(1991). 

[ 4) Rothman J.H., Raymond C.K., Gilbert T., O'Hara P.J., Stevens T.H. Cell 61 :1063-1074(1990) . 

[ 5] Jones B.A., Fangman W.L Genes Dev. 6:380-389(1992). 

[ 6] Arnheiter H., Meier E. New Biol. 2:851-857(1990). 

[ 7] Staeheli P., Pitossi R, Pavlovic J. Trends Cell Biol. 3:268-272(1993). 
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[0512] 161. (dynamin_2) Dynamin central region 

[0513] This region lies between the GTPase domain, see dynamin , and the pleckstrin homology (PH) domain. 
[0514] 1 62. E1-E2 ATPases phosphorylation site 

E1-E2 ATPases (also known as P-type) are cation transport ATPases which form an aspartyl phosphate intermediate 
ts in tho course of ATP hydrolysis. ATPases which belong to this family are listed below [1 ,2,3]. - Fungal and plant plasma 
mntnhmnu (I h ) All'/mofi |m.vIow*kJ In 4|. - vVnlobmlo (Nik, Ki) ATPnooo (oodium pump) jrovlowod In 5,G).-GH8trlc 
(K+, H+) ATPases (proton pump). - Calcium (Ca++) ATPases (calcium pump) from the sarcoplasmic reticulum (SR), 
the endoplasmic reticulum (ER) and the plasma membrane. - Copper (Cu++) ATPases (copper pump) which are in- 
volved in two human genetic disorders: Menkes syndrome and Wilson disease [7]. - Bacterial potassium (K+) ATPases. 
20 - Bacterial cadmium efflux (Cd++) ATPases [reviewed in 8]. - Bacterial magnesium (Mg++) ATPases. - A probable 
cation ATPano from Loichrnania. - lixl, a probablo cation ATPaso from Rhizobium moliloti, involved in nitrogen fixation. 
The mgion around tho phoophorylalod aspartate rosiduo is porloctly conserved in all thoso ATPases and can be used 
as a signature pattern. 

[0515] Consensus pattern: D-K-T-G-T-[LI]-[TI] [D is phosphorylated] 

25 

[ 1] Green N.M., McLennan D.H. Biochem. Soc. Trans. 17:819-822(1989). 
| 2] Groon N.M. Biochom. Soc. Trans. 17:970-972(1989). 
I 3) Fagan M.J.; Saier M.H. Jr. J. Mol. Evol. 38:57-99(1994). 
[ 4] Serrano R. Biochim. Biophys. Acta 947:1-28(1988). 
30 [ 5] Fambrough D.M. Trends Neurosci. 11 : 325-328(1 988). 

[ 6] Sweadner K.J. Biochim. Biophys. Acta 988:185-220(1989). 
[ 7] Bull P.C., Cox D.W. Trends Genet. 10:246-251(1994). 

[ 8] Silver S. ( Nucifora G., Chu L, Misra T.K. Trends Biochem. Set. 14:76-80(1989). 

- 35 [0516] 163. E1_N 

E1 Protein, N terminal domain 
Number of members: 90 

[0517] 164. (E1_dehydrog) Dehydrogenase E1 component 

[0518] This family uses thiamine pyrophosphate as a cofactor. This family includes pyruvate dehydrogenase, 2-ox- 
40 oglutarate dehydrogenase and 2-oxoisovalerate dehydrogenase. 
[0519] 165. (ECH) Enoyl-CoA hydratase/isomerase signature 

Enoyl-CoA hydratase (EC 4.2.1.17 ) (ECH) [1] and 3-2trans-enoyl-CoA isomerase(EC 5.3.3.8 ) (ECI) [2] are two en- 
zymes involved in fatty acid metabolism. ECH catalyzes the hydratation of 2-trans-enoyl-CoA into 3-hydroxyacyl-CoA 
and ECI shifts the 3- double bond of the intermediates of unsaturated fatty acid oxidation to the 2-trans position. Most 

45 eukaryotic cells have two fatty-acid beta-oxidation systems, one located in mitochondria and the other in peroxisomes. 
In mitochondria, ECH and ECI are separate yet structurally related monofunclional enzymes. Peroxisomes contain a 
trifunctional enzyme [3] consisting of an N-terminal domain that bears both ECH and ECI activity, and a C-terminal 
domain responsible tor 3-hydroxyacyl-CoA dehydrogenase (HCDH) activity. In Escherichia coli (gene fadB) and Pseu- 
domonas tragi (gene faoA), ECH and ECI are also part of a multifunctional enzyme which contains both a HCDH and 

so a3-hydroxybutyryl-CoA epimerase domain [4]. A number of other proteins have been found to be evolutionary related 
to the ECH/ECI enzymes or domains: - 3-hydroxbutyryl-coa dehydratase (EC 4.2. 1.55 ) (crotonase), a bacterial enzyme 
involved in the butyrate/butanol- producing pathway. - Naphthoate synthase (EC 4.1.3.36 ) (DHNA synthetase) (gene 
menB) [5], a bacterial enzyme involved in the biosynthesis ot menaquinone (vitamin K2). DHNA synthetase converts 
O-succinyl-benzoyl-CoA (OSB-CoA) to 1,4-dihydroxy- 2-naphthoic acid (DHNA). - 4-chlorobenzoate dehalogenase 

55 (EC 3.8.1.6 ) |6], a Pseudomonas enzyme which catalyzes the conversion of 4-chlorobenzoate-CoA to 4-hydroxyben- 
zoale-CoA. - A Rhodobacler capsulatus protein of unknown function (ORF257) [7]. - Bacillus subtilis putative polyketide 
biosynthesis proteins pksH and pksl. - Escherichia coli carnitine racemase (gene caiD) [8]. - Escherichia coli hypothet- 
ical protein ygfG. - Yeast hypothetical protein YDR036c.As a signature pattern tor these enzymes, a conserved region 
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[0520] Consensus pattern. IUvmj i 


I0520] 

[UVMFY]- 


15 


20 


25 


30 


MFY1 " k mT Eur J Biochem. 165:73-76(1989). 

I «l ^ « J "^^ 7 "l 9 S„ „ , SOW- J.O.. Chans K,H.. U«, P.-H. 

[ 5] Dnscoll J.R-. Taber n. charesl H., Sylvestre wi... 

6 Babbitt P.C Ke o n V°^ e L mis ^ , 3v5594-5604( 1 992). 
Lnaway-Mariano O^^J 107;171 . 172(1991) . M)crobiol . 13:77 5-786(1994). 

|SS a K . Buchet A.. K.eber H,R. Mandrand-Berthe.ot M. 

l0521l iee, E nBD)E,on 9 a.n^ 

Eukaiotic elongation ^'^ZlZs: the a.pna chain which tne beta and de.ta (or beta) 

fhains. The beta and deita n*«n. - a P re hy d,ophi.ic proteins ^^ 3 J ^ invo.ved in the 

alpha chain lor.GTP 2 . ^f^^fexchange activity, while the N-^^h, de velo P ed. The first corre- 

Consensuspattern.llVl-QSXui 15 - 4 20-424(1990). 

I 2] van Damme H.T.F., Amons n., 
1050:241-247(1990). 

10529] Consensus pa tern L-R-xTO ^ .fQ^.T-D-F-V-lSAMKRNl- 
so [0530] Consensus pattern. E-ILIVM] IN >io^MQ92^ 

• oui Gudkov A T.Biochimie 74:419-425(1992). 

I § rr^vtBu^har, W.A.. Spremu,, L.L. ^- B -^-^^^~^^ : ^ 

E^to^^ 
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[0533] Paccaud JP, Thomas DY, Bergeron JJ, Nilsson T, J Cell Biol 1998;140:751-765. 

172.' ENV_polyprotein 

ENV polyprotein (coat polyprotein) 

Numbor of momboro: 224 

s [0534] 173. (ERG4__ERG24) Ergoclorol blosynthosie ERG4/ERG24 family signatures 

Tyvo tungal enzymes'involved in ergosterol biosynthesis and which act by reducing double bonds in precursors of 
ergosterol have been shown to be evolutionary related [1). These are C-14 sterol reductase (gene ERG24 in budding 
yeast and erg3 in Neurospora Crassa) and C-24(28) sterol reductase (gene ERG4 in budding yeast and sts1 in fission 
yeast). Their sequences are also highly related to that of chicken lamin B receptor, which is thought to anchor the 

to |;irninn to tho innor. nucloar mombmno. Those proteins are highly hydrophobic and seem to contain seven or eight 
transmombrano regions. As signature patterns, two conserved regions were selected. The tirsl one is apparently lo- 
cated in a loop between the fourth and fifth transmembrane regions and the second is in the C-terminal section. 
[0535] Consensus pattern: G-x(2HHVM]-[YH]-D-x-[FYW]-x-G-x(2)-L-N-P-R- 
Consensus pattern: [LIVM](2)-H-R-x(2)-R-D-x(3)-C-x(2)-K-Y-G- 

is [ 1] Lai M.H., Bard M., Pierson C.A., Alexander J.F., Goebl M., Carter G.T. Kirsch D.R. Gene 140:41-49(1994). 
|0536] 174. (ERM) Ezrin/radixin/moosin family 

[0537] This family of proteins contain a band 4. 1 domain (Band 41) , at thoir amino terminus. This family represents 
the rest of these proteins. 

[0538] [1] Yonomura S, Hirao M, Doi Y, Takahashi N, Kondo T, Tsukita S, J Cell Bioi 1998;140:885-895. 

20 [0539] 175. ER lumen protein retaining receptor signatures 

Proteins that reside in the lumen of the endoplasmic reticulum (ER) contain aC-terminal tetrapeptide (generally K-D- 
E-L or H-D-E-L) that sorvos as a signal for Iheir retrieval (retrograde transport) from subsequent compartments of the 
secrolory pathway. Tho signal is recognized by a receptor molecule that is believed to cycle between the cis side of 
the Golgi apparatus and the ER [1].This protein is known as the ER lumen protein retaining receptor or also as the 

2$ 'KDEL rocoptor'. It has been characterized in a variety of species, including fungi (gene ERD2), plants, Plasmodium, 
Drosophila and mammals. In mammals two highly related forms of the receptor are known. Structurally, the receptor 
is a protein of about 220 residues that seems to contain seven transmembrane regions [2]. The N-terminal part (3 
rosiduos) is oriontod toward tho lumen while tho C-terminal tail (about 12 residues) is cytoplasmic. There are three 
lumenal and three cytoplasmic loops. Two signature patterns for these receptors were developed. The first pattern 

30. corresponds to the C-terminal half of the first cytoplasmic loop as well as most of the second transmembrane domain. 
The second pattern is a perfectly conserved decapeptide that corresponds to the central part of the fifth transmembrane 
domain. 

[0540] Consensus pattern: G-l-S-x-[KR]-x-Q-x-L-[FY)-x-[LIV](2)-F-x(2)-R-Y- 
Consonsus pattorn: L-E-[SA]-V-A-l-lLM]-P-Q-L- 


35 


[ 1] Pelham H.R.B. Curr. Opin. Cell Biol. 3:585-591(1991). 

[ 2] Townsley F.M., Wilson D.W., Pelham H.R.B. EMBO J. 12:2821-2829(1993). 

[0541] 176. (ETF_beta) Electron transfer flavoprotein beta-subunit signature 

40 The electron transfer flavoprotein (ETF) [1 ,2] serves as a specific electron acceptor tor various mitochondrial dehydro- 
genases. ETF transfers electrons to the main respiratory chain via ETF-ubiquinone oxidoreductase. ETF is an het- 
erodimer that consist of an alpha and a beta subunit and which bind one molecule of FAD per dimer. A similar system 
also exists in some bacteria. The beta subunit of ETF is a protein of about 28 Kd which is structurally related to the 
bacterial nitrogen fixation protein fixA which could play a role in a redox process and feed electrons to ferredoxin. Other 

45 rolatod proteins aro: - Escherichia coli hypothetical protein ydiQ. - Escherichia coli hypothetical protein ygcR.As a 
signature pattern for these proteins, a conserved region which is located in the central section was selected. 
[0542] Consensus pattern: [IVA]-x-[KR]-x(2)-[DE]- [GD]-[GDE]-x(1,2)-[EQ)-x-[LIV]- x(4)-P-x-[LIVM](2)-[TAC]- 

[ 1] Finocchiaro G., Ikeda Y, Ito M. ( Tahaka K. Prog. Clin. Biol. Res. 321:637-652(1990). 
so [ 2] Tsai M.H., Saier M.H. Jr. Res. Microbiol. 146:397-404(1995). 

[0543] 177. Endonuclease III signatures 

Escherichia coli endonuclease III (EC 4.2.99.18 ) (gene nth) [1] is a DNA repair enzyme that acts both as a DNA N- 
glycosylase, removing oxidized pyrimidines from DNA, and as an apurinic/apyrimidinic (AP) endonuclease, introducing 
55 a single-strand nick at the site from which the damaged base was removed. Endonuclease III is an iron-sulfur protein 
that binds a single 4Fe-4Scluster. Th 4Fe-4S cluster does not seem to be important for catalytic activity, but is probably 
involved in the proper positioning of the enzyme along the DNA strand [2].Endonuclease III is evolutionary related to 
the following proteins: - Fission yeast endonuclease III homolog (gene nthl) [3J. - Escherichia coli and related protein 
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Fission yeast hypothetical pro em SpAC26A3.02. . g bQund fey , QUr steine which i are 
the iron-suHur binding domain, the second P 

enzymes. IKRS1 .p. IK RAGL]-C-x(2)-C-x(5)-C (The tour C , '° (3H livM]-x(2MSALV]- 

l0544] consensus ^ X(3) ' 
Consensus pattern: IGST]-x-luvMrj 

1 41 Noelling J., van Eeden F.J.M.. Eggen m. . 

0546] This family ol proteins utilize n a mm Biochemistry 1997,36: 

SSTS MCI - _ 0. -=. -V - Ho»n ' — 

6294-6304. , T anc j t^e epsilon subunit o1 DNA 


20 


25 


[0549] - 

f0S y S0] er m Koonin EV. Deutscher MP. Nude, Acids Res 1993;21:2521-2522 


30 [05501 
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[0551] 1 80. ENTH (den tif ication ol a novel domain shared 

EL?£E£S « ENTH «- . * unknown. 

Eukaryotic initiation factor 5A ^^^p* to pro mote the formation ol the first peptide d a] 

[0559] Consensus pattern: [PTJ-G-K-H-Gx a i . 
(1991). 
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[0560] 183 (ef hand) S-100/ICaBP type calcium binding protein signature 

3-100 aro small dimeric acidic calcium and zinc-binding proteins [1] abundant in the brain. They Ipave two different 
types of qalcium-binding sites: a low affinity one with a special structure and a 'normal' EF-hand type high affinity site^ 
The vilnrnin-D dependent intootinnl calcium-binding proteins (ICaBP or calbindin 9 Kd) also belong to th.s famrty of 
6 proteins, but it does not form dimors. In Iho past years the sequences of many new members of this family have been 
determined (lor reviews see [2,3,4]); in most cases the function of these proteins is not yet known, although it is be- 
coming clearthat they are involved in cell growth and differentiation, cell cycle regulation and metabolic control. These 

cTlc^ receptor associated protein (PRA); clatropin; 2a9; 5B10; S100A6). - Calpactin I light chain ( P 10; 

w piv 42c- S100A10). ■ Calqranulin A (cystic fibrosis antigen (CFAg); MIF related protein 8 (MRP- 8); p8; S100A8). - 
Calgranulin B (MIF related protein 14 (MRP-14); p14; S100A9). - Calgranulin C. - Calgizzarin (S100C). - Placental 
calcium-binding protein (CAPL) (18a2; peL98; 42a; p9K; MTS1; metastatic S100A4). - Protein S-100D (S100A5). - 
Protoin S-100E (S100A3). - Protein S-100L (CAN19; S100A2). - Placental protein S-100P (S100E). - Psoriasm 
(S100A7) - Chemotactic cytokine CP-10 [5]. - Protein MRP-126 [6].' - Trichohyalin [7]. This is a large intermediate 

is filament-associated protein that associates with keratin intermediate filaments (KIF); it contains a S- 100 type domain 
In He N-lorminnI oxtiomily. A numbor of ihooo piololno uro known to bind calcium whilo othors are not (ptOlor oxample). 
Our EF-hand detecting pattern will fail to pick those proteins which have lost their calcium-binding properties. A pattern 
was developed which unambiguously picks up proteins belonging to this family. This pattern spans the region of the 
EF-hand hiqh aflinity site but makes no assumptions on the calcium-binding properties of this site. 

20 [0561] Consensus pattern: [LIVMFYW](2)-x(2)-[LK]-D-x(3)-[DN]-x(3)-IDNSG]-[FY]-x- [ES]-[FYVC]-x(2)-[LIVMFS]- 

[LIVMF] 

[ 1] Baudier J. (In) Calcium and Calcium Binding proteins, Gerday C. Bollls U Gillor R., Eds., pp102-113, Springer 
Verlag, Berlin, (1988). 

25 [2] Moncrief N.D.. Kretsinger R.H., Goodman M. J. Mol. Evol. 30:522-562(1990). 

[ 3] Kligman D., Hilt D.C. Trends Biochem. ScL 13:437-443(1988). 

[ 41 Schaefer B W , Wicki R., Engelkamp D., Mattel M.-G., Heizmann C.W. Genomics 25:638-643(1995). 
I 5] Lf.ckmann M. ( Cornish C.J., Simpson R.J., Moritz R.L., Geczy C.L J. Biol. Chem, 267:7499-7504(1992). 
[ 6] NakanoT., Graf T. Oncogene 7:527-534(1992). n<lSjl 40i ■ 

30 [ 7] Lee S.-C, Kim l.-G., Marekov LN., O'Keefe E.J., Parry D.A.D., Steinert P.M., J. Biol. Chem. 268:12164-12176 

(1993). 

EF-hand calcium-binding domain 

Many calcium-binding proteins belong to the same evolutionary family and share a type of calcium-binding domain 
35 known as the EF-hand [1 to 5]. This type of domain consists of a twelve residue loop flanked on both side by a twelve 
residue alpha-helical domain. In an EF-hand loop the calcium ion is coordinated in a pentagonal bipyramidai configu- 
ration. Tho six residues involved in the binding are in positions 1 f 3, 5,7.9 and 12; these residues are denoted by X, 
Y Z -Y -X and -Z The invariant Glu or Asp at position 12 provides two oxygens for liganding Ca (bidentate ligand). 
Listed below are the proteins which are known to contain EF-hand regions. For each type of protein the total number 
40 of EF-hand regions known or supposed to exist is indicated between parenthesis. This number does not include regions 
which clearly have lost their calcium-binding properties, or Ihe atypical low-atfinity site (which spans thirteen residues) 
found in the S-1 00/ 
ICaBP family of proteins [6]. 

45 . Aequorin and Renilla luciferin binding protein (LBP) (Ca=3). 
Alpha actinin (Ca=2). - Calbindin (Ca=4). 

Calcineurin B subunit (protein phosphatase 2B regulatory subunit) (Ca=4). 
Calcium-binding protein from Streptomyces erythraeus (Ca=3?). 
Calcium-binding protein from Schistosoma mansoni (Ca=2?). 
50 . Calcium-binding proteins TCBP-23 and TCBP-25 from Tetrahymena thermophila (Ca=4?). - CateiunrwJependent 
protoin kinases (CDPK) from plants (Ca=4). 
Calcium vector protein from amphoxius (Ca=2). 

— Calcyphosin (thyroid protein p24) (Ca=4?). 

- . Calmodulin (Ca=4, except in yeast where Ca=3). 

55 - Calpain small and large chains (Ca=2). - Calretinin (Ca=6). 
Calcyclin (prolactin receptor associated protein) (Ca=2). 
Caltractin (cenlrin) (Ca=2 or 4). 

Cell Division Control protein 31 (gene CDC31) from yeast (Ca=2?). 


95 


tNSDOCID: <EP 1 033405 A2J_> 


EP 1 033 405 A2 


- Diacylglycerol kinase (EC 2.7.1 .107) (DGK) (Ca=2). 

FAD-dependent glycerol-3-phosphate dehydrogenas (EC 1.1.99.5) from mammals (Ca=1). - Fimbrin (plastin) 
(Ca=2). 

Flagellar calcium-binding protein (1f8) from Trypanosoma cruzi (Ca=1 or' 2). 
5 - Guanylate cyclase activating protein (GCAP) (Ca=3). 

Inositol phospholipid-specific phospholipase C isozymes gamma-1 and delta-T (Ca==2) [10]. - Intestinal calcium- 
binding protein (ICaBPs) (Ca=2). 

MIF related proteins 8 (MRP-8 or CFAG) and 14 (MRP-14) (Ca=2). 

Myosin regulatory light chains (Ca=1). - Oncomodulin (Ca=2). 
io - Osteonectin (basement membrane protein BM-40) (SPARC) and proteins that conlains an 'osteonectin 1 domain 

(QR1, matrix glycoprotein SCI) (see the entry <PDOC00535>) (Ca=1). - Paralbumins alpha and beta (Ca=2). 

Placental calcium-binding protein (I8a2) (nerve growth factor induced protein 42a) (p9k) (Ca=2). 

Recoverins (visinin, hippocatcin, neurocalcin, S-modulin) (Ca=2 to 3). 

Reticulocalbin (Ca=4). - S-100 protein, alpha and beta chains (Ca=2). 
*5 . Sarcoplasmic calcium-binding protein (SCPs) (Ca=2 to 3). 

Sea urchin proteins Spec 1 (Ca=4), Spec 2 (Ca=4?), Lps-1 (Ca=8). 

Serine/threonine protein phosphatase rdgc (EC 3.1.3.16) Irom Drosophila (Ca=2) - Sorcin V19 from hamster 
(Ca=2). - Spectrin alpha chain (Ca=2). 

Squidulin (optic lobe calcium-binding protein) from squid (Ca=4). 
20 - Troponins C; from skeletal muscle (Ca=4), from cardiac muscle (Ca=3), from arthropods and molluscs (Ca=2). 

There has been a number of attempts [7,8] to develop patterns that pick-up EF-hand regions, but these studies were 
made a few years ago when not so many different families of calcium-binding proteins were known. Therefore a new 
pattern was developed which takes into account all published sequences. This pattern includes the complete EF-hand 
25 loop as well as the first residue which follows the loop and which seem to always be hydrophobic. 

- Consensus pattern: D-x-[DNS]-{ILVFYW}-[DENSTG]-[DNQGHRK]-{GP}-[LIVMC]-[DENQSTAGC]-x(2)-[DE]- 
[LIVMFYW] 

Note: positions i (X), 3 (Y) and 12 (-Z) are the most conserved, 
30 - Note: the 6th residue in an EF-hand loop is, in most cases a Gly, but the number of exceptions to this 'rule' has 
gradually increased and therefore the pattern should include all the different residues which have been shown to 
exist in this position in functional Ca-binding sites. 

Note: the pattern will, in some cases, miss one of the EF-hand regions in some proteins with multiple EF-hand 
domains. 

35 

[ 1] Kawasaki H., Kretsinger R.H. Protein Prof. 2:305-490(1 995).[ 2] Kretsinger R.H. Cold Spring Harbor Symp. 
Quant. Biol. 52:499-510(1987). 

[ 3] Moncrief N.D., Kretsinger R.H., Goodman M. J. Mol. Evot. 30:522-562(1990). 
[ 4] Nakayama S. ( Moncrief N.D., Kretsinger R.H. J. Mol. Evol. 34:416-448(1992). 
40 [ 5] Heizmann C.W., Hunziker W. Trends Biochem. Sci. 16:98-103(1991). 

[ 6] Kligman D„ Hilt D.C. Trends Biochem. Sci. 13:437-443(1988). 
[ 7] Strynadka N.C.J. , James M.N.G. Annu. Rev. Biochem. 58:951-98(1989). 
[ 8] Haiech J., Sallantin J. Biochimie 67:555-560(1985). 

[ 9] Chauvaux S„ Boguln P., Aubort J. -P., Bhat K.M., Gow L.A., Wood T.M., Balroch A. Blochom. J. 265:261-265 
45 (1990). 

[10] Bairoch A., Cox J. A. FEBS Lett. 269:454-456(1990). 

[0562] 184. Fnolase signature 

Enolase (EC 4.2.1.11 ) is a glycolytic enzyme that catalyzes the dehydration of2-phospho-D-glycorato to phosphoo- 
50 nolpyruvate [1]. It is a dimeric enzyme that requires magnesium both for catalysis and stabilizing the dimer. Enolase 
is probably found in all organisms that metabolize sugars. In vertebrates, there are three different tissue-specific iso- 
zymes: alpha present in most tissues, beta in muscles and gamma found only in nervous tissues. Tau-crystallin, one 
of the major lens proteins in some fish, reptiles and birds, has been shown [2] to be evolutionary related to enolase. 
As a signature pattern for enolase, the best conserved region was selected, it is located in the C-terminal third of the 
55 sequence. - 

[0563] Consensus pattern: [LIV](3)-K-x-N-Q-l-G-[ST]-[LIV]-[ST]-[DEHSTA] i 
[1] Lebioda L ( Stec B., Brewer J.M. J. Biol. Chem. 264:3685-3693(1989). 
[ 2] Wistow G., Piattigorsky J. Science 236:1554-1556(1987). 
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[0564] 185 (F-actin_cap_A) F-actin capping protein alpha subunit signatures 

Tho F-actin capping protein binds in a calcium-independent manner to the fast growing ends of actin filaments (barbed 
end) thereby blocking the exchange of subunits at these ends. Unlike gelsolin and severin this protein does not sever 
nrlin lilnmontn. Tho F+ictin cnppinq protein is n hotorodimor composod of two unrelated subunits: alpha and bota.The 
/; ;ilphn cubunlt iG h protoin ol nboul 268 to 286 amino ncid roclduos whoso soquonco is well conserved in eukaryotic 
species |1). As signature patterns two highly conserved regions in the C-lerminal section of the alpha subunit were 
selected. 

[0565] Consensus pattern: V-H-|FY](2)-E-D-G-N-V 

Consensus pattern: F-K-[AE]-L-R-R-x-L-P- ; ; 

io [0566] | 1] Coopor J.A., Caldwoll J.E., Gatlormeir D.J.. Torres M.A.. Amatruda J.F., Casella J.F. Cell Motil. Cytoskel- 
eton 18:204-214(1991). 
[0567] 186. F-box domain 

|0568] [1] Bui C ( Son P, Holmann K, Ma L, Goobl M, Harper JW.EIledge SJ, Cell 1996;86:263-274. [2] Skowyra D, 
Craig KL. Tyers M, Elledge SJ, Harper JW t Coll 1997:91:209-219. 
is [0569] 187. F-protein 

Nocifilivo factor, (F Protoin) or Nof. _ . , 

[0570] [1] Arold S, Franken P, Strub M-P. Hon F, Benichou S, Benarous R, Dumas C; Medline: , 98035457 ) The crystal 
structuro of HIV-1 Net protein bound to the Fyn kinase SH3 domain suggests a role for this complex in altered T cell 
mcoptor signalling Structure 1997;5:1361-1372. 
20 [0571] Nel protein accelerates virulent progression of AIDS by its interaction with cellular proteins involved in signal 
transduction and host cell activation. Nef has been shown to bind specifically to a subset of the Src kinase family. 
[0572] Number of members: 1013 
[0573] 188. (FAD_binding„2) 

Fumarate reductase / succinate dehydrogenase FAD-binding site In bacteria two distinct, membrane-bound, enzyme 
25 complexes are responsible for the interconversion of fumarate and succinate (EC 1.3.99.1): fumarate reductase (Frd) 
is used in anaerobic growth, and succinate dehydrogenase (Sdh) is used in aerobic growth. Both complexes consist 
of two main components: a membrane-extrinsic component composed of a FAD-binding flavoprotein and an iron-sulfur 
protoirv and an hydrophobic component composed of a membrane anchor protein and/or a cytochrome B. 
[0574] ' In eukaryoles mitochondrial succinate dehydrogenase (ubiquinone) (EC 1 .3.5.1 ) is an enzyme composed of 
30 two subunits: a FAD flavoprotein and and iron-sulfur protein. 

[0575] Tho flavoprotein subunit is a protein of about 60 to 70 Kd to which FAD is covalently bound to a histidine 
residue which is located in the N-terminal section of the protein [1 ]. The sequence around that histidine is well conserved 
in Frd and Sdh from various bacterial and eukaryotic species [2] and can be used as a signature pattern. 
[0G76J Concionoutt |>nllomR-lST]-W-|ST]-x(2)-A.x-G-G [H in tho FAD binding sito] Soquoncos known to belong to this 
35 class delected by the pattern ALL. 

[ 1] Blaut M. ( Whittaker K., Valdovinos A. , Ackrell B.A., Gunsalus R.R, CecchiniG. J. Biol. Chem. 264:13599-13604 
(1989) 

[ 2] Birch-Machin M.A., Farnsworth U Ackrell B.A.. Cochran B., Jackson S., Bindoff LA.. Aitken A., Diamond A. 
40 G., Turnbull D M. J. Biol. Chem. 267:11553-11558(1992). 

[0577] 189 Fatty acid desaturases signatures (FA_desaturase) 

Fatty acid desaturases (EC 1.14.99.-) are enzymes that catalyze the insertion of a double bond at the delta position 
of fatty acids There seems to be two distinct families of fatty acid desaturases which do not seem to be evolutionary 

« rolMlod Family 1 io compoood of: - Stonroyl-CoAdoonturnso (SCD) (EC 1 .14.99.5) [1]. SCD is a key regulatory enzyme 
ol unsaturated fatty acid biosynthesis. SCD introduces a cis double bond at tho delta(9) position of tatty acyl-CoAe 
such as palmitoleoyl- and oleoyl-CoA. SCD is a membrane-bound enzyme that is thought to function as a part of a 
mullienzyme complex in the endoplasmic reticulum of vertebrates and fungi. As a signature pattern for this family a 
conserved region in the C-lerminal part of these enzymes was selected, this region is rich in histidine residues and in 

so aromatic residues. Family 2 is composed of: - Plants stearoyl-acyl-carrier-protein desaturase (EC 1 .14:99.6 ) [2], these 
enzymes catalyze tho introduction of a double bond at the delta(9) position of steraoyl-ACP to produce oleoyl-ACP. 
This enzyme is responsible for ihe conversion of saturated fatty acids to unsaturated fatty acids in the synthesis of 
vegetable oils. - Cyanobacteria desA [3] an enzyme that can introduce a second cis double bond at the delta(12) 
position of fatty acid bound to membranes glycerolipids. DesA is involved in chilling tolerance; the phase transition 

55 temperature of lipids of cellular membranes being dependent on the degree of unsaturation of fatty acids of the mem- 
brane lipids. As a signature pattern for this family a conserved region in the C-terminal part of these enzymes was 
soloctod. 

[0578] Consensus pattern: G-E-x-[FY]-H-N-[FY]-H-H-x-F-P-x-D-Y- 
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Consensus pattern: [STHSA]-x(3)-[QR]-[LI]-x(5,6)-D-Y-x(2)-[LIVMFYWHLIVM]-[DE]- 

[ 1] Kaestner K.H., Ntambi J.M., Kelly T.J. Jr., Lane M.D. J. Biol.Chem. 264:14755-14761(1989). 
[ 2] Shanklin J., Somerville C.R. Proc. Natl. Acad. ScL U.S.A. 88:2510-2514(1991). 
5 [ 3] Wada H., Gombos Z., Murata N. Nature 347:200-203(1 990). 

[0579] 190. Fructose-l-6-bisphosphatase active site (FBPase) 

Fructose-1,6-bisphosphatase (EC 3.1.3.11 ) (FBPase) [1], a regulatory enzyme in glaconeogenesis, catalyzes the hy- 
drolysis of Iructose 1 ,6-bisphosphate to Iructose 6-phosphate. It is involved in many different metabolic pathways and 

io found in most organisms. Sedoheptulose-1 ,7-bisphosphatase (EC 3.1.3.37 ) (SBPase) [2] is an enzyme found plant 
chloroplast and in photosynthetic bacteria that catalyzes the hydrolysis of sedoheptulose 1 ,7-bisphosphate to sedo- 
heptulose 7-phosphate, a step in the Calvin's reductive pentose phosphate cycle. It is functionally and structurally 
related to FBPase. In mammalian FBPase, a lysine residue has been shown to be involved in the catalytic mechanism 
[3]. The region around this residue is highly conserved and can be used as a signature pattern for FBPase and SBPase. 

is It must be noted that, in some bacterial FBPase sequences, the active site lysine is replaced by an arginine 
Consensus pattern: [AG]-[RK]-L-x(1,2)-[LIV]-[FY]-E-x(2)-P-[LIVMJ-|GSA] [K/R is the active site residue]- 

[ 1] Benkovic S.J., DeMaine M.M. Adv. Enzymol. 53:45-82(1982). 

[ 2] Raines C. A., Lloyd J.C., Willingham N.M., Potts S., Dyer TA. Eur. J. Biochem. 205:1053-1059(1992). 
20 [ 3] Ke H., Thorpe CM., Seaton B.A., Lipscomb W.N., Marcus F J. Mol. Biol. 212:513-539(1989). 

[0580] 191. FGGY family of carbohydrate kinases signatures * 

It has been shown (1 ] that four different type of carbohydrate kinases seem to be evolutionary related. These enzymes 
are: - L-lucolokinase (EC 2.7.1.51) (gene fucK). - Gluconokinase (EC 2.7.1.12 ) (gene gntK). - Glycerokinase (EC 

25 2.7.1.30 ) (gene glpK). - Xylulokinase (EC 2.7.1.17 ) (gene xylB). - L-xylulose kinase (EC 2.7.1.53 ) (gene lyxK).These 
enzymes are proteins of from 480 to 520 amino acid residues. As consensus patterns for this family of kinases two 
conserved regionswere selected, one in the central section, the other in the C-terminal section. 
[0581] Consensus pattern: lMFYGS]-x-[PST]-x(2)-K-[LIVMFYW]-x-W-[LIVMF]-x-[DENQTKR]- [ENQH]- 
Consensus pattern: [GSA)-x-[LIVMFYW]-x-G-[LIVM]-x(7,8)-[HDENQ]-[LIVMF]-x(2)-[AS]-[STAIVM]-[LIVMFY]-[DEQ]- 

30 [0582] [ 1) Reizer A., Deutscher J., Saier M.H. Jr., Reizer J. Mol. Microbiol. 5:1081-1089(1991). 
[0583] 192. FKBP-type peptidyl-prolyl cis-trans isomerase signatures/profile (FKBP) 

FKBP |1 ,2,3] is the major high-affinity binding protein, in vertebrates, for the immunosuppressive drug FK506. It exhibits 
peptidyl-prolyl cis-trans isomerase activity (EC 5.2.1.8 ) (PPIase or rotamase). PPIase is an enzyme that accoloratos 
protein folding by catalyzing the cis-trans isomerization of proline imidic peptide bonds in oligopeptides [4].At least 

35 three different forms of FKBP are known in mammalian species: - FKBP-12, which is cytosolic and inhibited by both 
FK506and rapamycin. - FKBP-1 3, which is membrane associated and inhibited by both FK506 and rapamycin. - FKBP- 
25, which is preferentially inhibited by rapamycin. These forms of FKBP are evolutionary related and show extensive 
similaritteslS.e^] with the following proteins: - Fungal FKBP. - Mammalian hsp binding immunophilin (HBI) (also called 
p59). HBI is a protein which binds to hsp90 and contains two FKBP-tike domains in its N- terminal section - the first ol 

40 which seems to be functional. - The C-terminal part of the cell-surface protein mip from Legionella; a protein associated 
with macrophage infection by an unknown mechanism. - Escherichia coll slyD [8], a protein with a N-torminal FKBP 
domain followed by an histidine-rich metal-binding domain. - Escherichia coli fkpA. - Escherichia coli fklB (FKBP22). 
- Escherichia coli sIpA. - Bacterial trigger factor (Tig). - Streptomyces hygroscopus and chrysomallus FK506-binding 
protein. - Chlamydia trachomatis 27 Kd membrane protein. - Neisseria meningitidis strain C114 PPiase. - Probable 

45 PPiases from Haemophilus influenzae (HI0754), Methanococcus jannaschii (MJ0278 and MJ0825), Pseudomonas 
fluorescens and Pseudomonase aeruginosa. Two signature patterns tor these proteins were developed. One is based 
on a conserved region in the N-terminus of FKBP, the other is located in the central section. The profilo for FKBP spans 
the complete domain. 

[0584] Consensus pattern: [LIVMC]-x-[YF]-x-[GVL]-x(1 ( 2)-[LFT]-x(2)-G-x(3)-|DE]-ISTAEQK]-[STAN]- 
50 [0585] Consensus pattern: [LIVMFY]-x(2)-[GA]-x(3,4)-[LIVMF]-x(2)-[LIVMFHK]-x(2)-G- x(4)-[LIVMF]-x(3HPS- 
GAQ]-x(2)-|AG]-[FY]-G- 

[ 1] Tropschug M. t Wachter E., Mayer S., Schoenbrunner E.R., Schmid F.X. Nature 346:674-677(1990). 
[ 2] Stein R.L Curr. Biol. 1:234-236(1991). 
ss [ 3] Siekierka J.J., Widerrecht G., Greulich H., Boulton D., Hung S.H.Y, Cryan J., Hodges P.J., Sigal N.H. J. Biol. 

Chem. 265:21011-21015(1990). 1 
[ 4] Fischer G., Schmid F.X, Biochemistry 29:2205-2212(1990), 
I 5] Trandinh C.C., Pao G.M., Saier M.H. Jr. FASEB J. 6:3410-3420(1992). 
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[ 6] Galat A. Eur. J. Biochem. 216:689-707(1993). 

'[ 7] Hacker J., Fischer G. Mol. Microbiol. 10:445456(1993). 

[8] Wuelfing C, Lomardero J., Plueckthun A. J. Biol. Chem. 269:2895-2901(1994). 

5 [0586] 193. MAP EG family (aka: FLAP/GST2/LTC4S family signature) 
[0587] The following mammalian proteins are evolutionary related [1]: 

- Leukotriene C4 synthase (EC 2.5.1 .37) (gene LTC4S), an enzyme that catalyzes the production of LTC4 from LTA4. 

- Microsomal glutathione S-transferase II (EC 2.5.1.18) (GST-II) (gene GST2), an enzyme that can also produces 
to L'l'CI lion LTA4. 

5-lipoxygenase activating protein (gene FLAP), a protein that seems to be required for the activation of 5-hpoxy- 
genase. 

[0588] These are proteins of 1 50 to 1 60 residues that contain three transmembrane segments. As a signature pattern, 
ig h consorvod region botwoon tho first and second transmembrane domains was selected. 
[0500] Contioneuti pnltom; G-x(a)-F-E-n-V-|FY]-x-A-|NQ]-x-N-C 

[0590] [1] Jakobsson P.-J., Mancini J.A., Ford-Hutchinson A.W. J. Biol. Chem. 271:22203-22210(1996). 
[0591] 194. FMN-dependent alpha-hydroxy acid dehydrogenases active site (FMN„dh) 

A number of oxidoreductases that act on alpha-hydroxy acids and which are FMN-containing flavoproteins have been 
20 shown [1,2,3] to be structurally related; these enzymes are: - Lactate dehydrogenase (EC 1.1.2.3), which consists of 
n rjohyflrrifinnfinn domain nnd n homo-blncJinfl domnin cnllocl cytochromo b2 and which cntalyzos tho convorsion of 
lactate into pyruvate. - Glycolato oxidase (EC 1.1.3 15 ) ((S)-2-hydroxy-acid oxidase), a peroxisomal enzyme that cat- 
alyzes the conversion of glycolate and oxygen to glyoxylate and hydrogen peroxide: - Long chain alpha-hydroxy acid 
oxidase from rat (EC 1.1.3.15 ), a peroxisomal enzyme. - Lactate 2-monooxygenase (EC 1.13.12.4) (lactate oxidase) 
25 from Mycobacterium smegmatis, which catalyzes the conversion of lactate and oxygen to acetate, carbon dioxide and 
water. - (S)-mandelate dehydrogenase from Pseudomonas putida (gene mdlB), which catalyzes the reduction of (S)- 
rnfindolulo to bonzoylformato. Tho first stop in tho reaction mechanism of these enzymes is the abstraction of the 
proton Itom tho alpha-carbon of tho substrate producing a carbanion which can subsequently attach to the N5 atom 
of FMN. A conserved histidine has been shown [4J to be involved in the removal of the proton. The region around this 
30 active site residue is highly conserved and contains an arginine residue which is involved in substrate binding. 
[0592] Consensus pattern: S-N-H-G-[AG]-R-Q [H is the active site residue] [R is a substrate-binding residue]- 

[ 1) Giogol D.A., Williams C.H. Jr., Massey V. J. Biol. Chem. 265:6626-6632(1990). 

[2] Tsou A.Y., Ransom S.C., Gerlt J.A., Buechter D.D., Babbitt P.C., Kenyon G.L. Biochemistry 29:9856-9862 
35 (1990). 

[ 3] Le K.H.D., Lederer F. J. Biol. Chem. 266:20877-20880(1991). 
[ 4] Lindqvist Y, Branden C.-L J. Biol. Chem. 264:3624-3628(1989). 

[0593] 195. Flavin-binding monooxygenase-like (FMO-like) 
40 [0594] This family includes FMO proteins, cyclohexanone monooxygenase 
[0595] 196. (FPGS) 

Folylpolygtutamate synthase signatures (aka Murjigase) 

[0596] Folylpolyglutamate synthase (EC 6.3.2. 1 7) (FPGS) [1] is the enzyme of folate metabolism that catalyzes ATP- 
dopondont addition of glutamate moieties to tetrahydrofolate. 
45 [0597] Its sequence is moderately conserved between prokaryotes (gene folC) and eukaryotes. We developed two 
signature patterns based on the conserved regions which are rich in glycine residues and could play a role in the 
catalytical activity and/or in substrate binding. 

[0598] Consensus pattern [LIVMFY]-x-fLIVM]-[STAG]-G-T-[NK]-G-K-x-[ST]-x(7)- [LIVM](2)-x(3)-[GSK] Sequences 
known to belong to this class detected by the pattern ALL. 
so [0599] Consensus pattern[LIVMFY](2)-E-x-G-[LIVM]-[GA]-G-x(2)-D-x-[GST]-x-[LIVM](2) Sequences known to be- 
long to this class detected by the pattern ALL. 

[0600] [ 1] Shane B., Garrow T, Brenner A., Chen L, Choi Y.J., Hsu J.C., Stover P. Adv. Exp. Med. Biol. 338:629-634 
(T993). 

[0601] 197. FYVE zinc finger 
55 [0602] The FYVE zinc finger is named after four proteins that it has been found in: Fab1, YOTB/ZK632.12, Vac1, 
and EEA1. The FYVE finger has be n shown to bind two Zn++ ions [1]. The FYVE linger has eight potential zinc 
coordinating cysteine positions. Many members of this lamily also include two histidin s in a motif R+HHC+XCG, where 
+ represents a charged residue and X any residue. Members were included which do not conserve these histidine 
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residues but are clearly related. 

[0603] [1] Stenmark H, Aasland R, Toh BH, D'Arrigo A, J Biol Chem 1996;271:24048-24054. [2] Gaullier JM, Si- 
monsen A, D'Arrigo A, Bremnes B, Stenmark H, Aasland.R, Nature 1998;394:432-433. 
[0604] 198. F_acttn_cap_B 
5 F-actin capping protein beta subunit signature 

[0605] The F-actin capping protein binds in a calcium-independent manner to the fast growing ends of actin filaments 
(barbed end) thereby blocking the exchange of subunits at these ends. Unlike gelsolin and severin this protein does 
not sever actin filaments. The F-actin capping protein is a heterodimercomposed of two unrelated subunits: alpha and 
beta. 

to [0606] The beta subunit is a protein of about 280 amino acid residues whose sequence is well conserved in eukaryotic 
species [1]. As a signature pattern a conserved hexapeptide in the N-terminal section of the beta subunit was selected. 
[0607] Consensus pattern: C-D-Y-N-R-D Sequences known to belong to this class detected by the pattern ALL. 
[0608] [1] Amatruda J.F., Cannon J.F., Tatchell K., Hug C. ( Cooper J. A. Nature 344:352-354(1990). 
[0609] 199. Isopenicillin N synthetase signatures (Fe_Asc_oxidored) 

is Isopenicillin N synthetase (IPNS) [1 ,2] is a key enzyme in the biosynthesis of penicillin and cephalosporin. In the pres- 
ence of oxygen, it removes iron and ascorbate, four hydrogen atoms from L-(alpha-aminoadipyl) : L-cysteinyl-d-valine 
to form the azetidinone'and thiazolidine rings of isopenicillin. IPNS is an enzyme of about 330 amino-acid residues. 
Two cysteines are conserved in fungal and bacterial IPNS sequences; these may be involved in iron-binding and/or 
substrate-binding. Cephalosporin) acremonium DAOCS/DACS [3] is a bif unctional enzyme involved in cephalosporin 

20 biosynthesis. The DAOCS domain, which is structurally related to IPNS, catalyzes the step from penicillin N to deac- 
etoxy-cephalosporin C - used as a substrate by DACS to form deacetylcephalosporin C. Streptomycesclavuligerus 
possesses a monof unctional DAOCS enzyme (gene cefE) [4] also related to IPNS. Two signature patterns for these 
enzymes were derived, centered around the conserved cysteine residues. 
[0610] Consensus pattern: [RK]-x-|STA]-x(2)-S-x-C-Y-[SL]- 

25 Consensus pattern: [LIVM](2)-x-C-G-[STA]-x(2)-[STAG]-x(2)-T-x-[DNG]- 

[1] Martin J.F. Trends Biotechnol. 5:306-308(1987). 

[ 2] Chen G., Shiftman D. ( Mevarech M., Aharonowitz Y. Trends Biotechnol. 8:105-111(1990). 
[ 3] Samson S.M., Dotzlaf J.E., Slisz M.L., Becker G.W., van Frank R.M., Veal L.E., Yeh W.K., Miller J.R., Oueener 
30 S.W., Ingolia T.D. Bio/Technology 5:1207-1214(1987). 

[ 4] Kovacevlc S„ Woigol B.J., Tobln M.B., Ingolia T.D,, Mlllor J.R. J, Bactorlol. 171:754-760(1009). 

[0611] 200. Fibrillarin signature 

Fibrillarin [1] is a component of a nucleolar small nuclear ribonucleoprotein(SnRNP) particle thought to participate in 
35 the first step of the processing of pre-rRNA. In mammals, fibrillarin is associated with the U3, U8 and U13small nuclear 
RNAs [2]. Fibrillarin is an extremely well conserved protein of about 320 amino acid residues. Structurally it consists 

01 Ihroo dllloront domains: - An N-toiminal domain ol nbout 00 mnlno unlcJo which In vnry rich In fllyclnn unci cnnlnlrm 
a number of dimethylated arginine residues (DMA). - A central domain ot about 90 residues which resembles that of 
RNA-binding proteins and contains an octameric sequence similar to the RNP-2 consensus found in such proteins. - 

40 a C-terminal alpha-helical domain. A protein evolutionary related to fibrillarin has been found [3] in archaebacteria 
such as Methanococcus vannielii or voltae. This protein (geneflpA) is involved in pre-rRNA processing. It lacks the 
Gly/Arg-rich N-terminal domain. As a signature pattern, a region was selected that starts with and encompa6os thoRNP- 

2 like octapeptlde sequence. 

[0612] Consensus pattern: [GST]-[LIVMAP]-V-Y-A-[IV]-E-[FY]-[SA]-x-R-x(2)-R-[DE]- 

45 

[ 1] Aris J. P., Blobel G. Proc. Natl. Acad. Sci. U.S.A. 88:931-935(1991). 

[ 2] Bandziulis R.J., Swanson M.S., Dreyfuss G. Genes Dev. 3:431-437(1989). 

[ 3] Agha-Amiri K. J. Bacteriol. 176:2124-2127(1994). 

so [0613] 201. Filamin/ABP280 repeat 

[0614] (1] Fucini R Renner C, Herberhold C, Noegel AA, Holak TA, Nat Struct Biol 1997;4:223-230. 
[0615] 202. Fucosyl transferase 

[0616] This family of Fucosyltransferases are the enzymes transferring fucose from GDP-Fucose to GlcNAc in an 
alphal 3 linkage [1). 
55 [0617] [1] Breton C, Oriol R, Imberty A; Glycobiology 1998;8:87-94. 

[0618] 203. 2Fe-i2S ferredoxins, iron-sulfur binding region signature (fer2A) E 
Ferredoxins [1] are a group of iron-sulfur proteins which mediate electron transfer in a wide variety of metabolic reac- 
tions. Ferredoxins can be divided into several subgroups depending upon the physiological nature of the iron sulfur 
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clusler(s) and according to sequence similarities. One ot these subgroups are the 2Fe-2S ferredoxins, which are pro- 
teins or domains of around on© hundred amino acid residues that bind a single 2Fe-2S iron-sulfur cluster. The proteins 
that are known (2] to belong to this family are listed below.' - Ferredoxin from photosynthetic organisms; namely plants 
iind nlgno whoro it io localod in tho chloroplast or cyanollo; and cyanobactoria. - Forrodoxin from archaobactoria of 

5 iho Halobaclorium genus. - Ferredoxin IV (gene pftA) and V (gene tdxD) from Rhodobacter capsulatus. - : Ferredoxin 
in- the toluene degradation operon (gene xylT) and naphthalene degradation operon (gene nahT) of Pseudomonas 
pulida. - Hypothetical Escherichia coli protein yfaE. - The N-terminal domain of the bifunctional ferredoxin/ferredoxin 
reductase electron transfer component of the benzoate 1 ,2-dioxygenase complex (gene benC) from Acinetobacter 
calcoaceticus, the toluene 4-monooxygenase complex (gene tmoF), the toluate 1 ,2-dioxygenase system (gene xylZ), 

io find iho xylono monooxygonaso system (gone xylA) from Pseudomonas. - The N-terminal domain of phenol hydrox- 
ylase protein p5 (gene dmpP) from Pseudomonas Putida. - The N-terminal domain of methane monooxygenase com- 
ponent C (gene mmoC) from Methylococcus capsulatus . - The C-terminal domain of the vanillate degradation pathway 
protoin vanB in a Pseudomonas spocios, - The N-terminal domain of bacterial fumarate reductase iron-sulfur protein 
(gene frdB). - The N-terminal domain of CDP-6-deoxy-3,4-glucoseen reductase (gene ascD) from Yersinia pseudotu- 

is borculosis. - The central domain of oukaryotic succinate dehydrogenase (ubiquinone) iron- sulfur protein. - The N- 
loirnlnnl cJornnln ol oukuryollc xiinlhino dohydrogonaoo. - Tho N-lorminal domain of oukaryotic aldohydo oxidase. In 
the 2Fe-2S ferredoxins, four cysteine residues bind the iron-sulfur cluster. Three of these cysteines are clustered 
together in the same region of the protein. Our signature pattern spans that iron-sulfur binding region. 
[0619] Consensus pattern: C-{C}-{C}-[GA]-|C)-C-[GAST]-{CPDEKRHFYW}-C [The three C's are 2Fe-2S ligands]- 

20 { 1] Meyer J. Trends EcoL EvoL 3:222-226(1 988).[ 2] Harayama S., Polissi A., Rekik M. FEBS Lett. 285:85-88(1991). 
[0620] Adrenodoxin family, iron-sulfur binding region signature (fer2B) 

Forrodoxins |1] aro a group of iron-sulfur proteins which mediate electron transfer in a wide variety of metabolic reac- 
tions. Ferredoxins can be divided into several subgroups depending upon the physiological nature of the iron sulfur 
cluster(s) and according to sequence similarities. One family of ferredoxins groups together the following proteins that 

25 all bind a single 2Fe-2S iron-su If ur cluster: - Adrenodoxin (ADX) (adrenal ferredoxin), a vertebrate mitochondrial protein 
which transfers electrons from adrenodoxin reductase to cytochrome P450scc, which is involved in cholesterol side 
chain clonvago. - Putidarodoxin (PTX), a Pseudomonas putida protein which transfers electrons from putidaredoxin 
loducUiso to cytochrome P450-cam, which is involved in tho oxidation ot camphor. - Torprodoxin [2], a Pseudomonas 
protein which transfers electrons from terpredoxin reductase to cytochrome 

so P450-terp, which is involved in the oxidation of alpha-terpineol. - Rhodocoxin [3], a Rhodococcus protein which transfers 
electrons from rhodocoxin reductase to cytochrome CYP116 (thcB), which is involved in the degradation of thiocar- 
bamate herbicides. - Escherichia coli ferredoxin (gene fdx) [4] whose exact function is not yet known. - Rhodobacter 
capsulatus ferredoxin VI |5], which may transfer electrons to a yet uncharacterized oxygenase. - Caulobacter crescen- 
tus ferredoxin (gone fdxB) [6]. In these proteins, four cysteine residues bind the iron-sulfur cluster. Three of these 

3S cysteines are clustered together in the same region of the protein. Our signature pattern spans that iron-sulfur binding 
region. 

[0621] Consensus pattern: C-x(2)-|STAQ]-x-[STAMV]-C-[STA]-T-C-[HR] |The three C's are 2Fe-2S ligands]- 

[ 1] Meyer J. Trends Ecol. Evol. 3:222-226(1988). 
40 [2] Peterson J. A., Lu J.-Y., Geissetsoder J., Graham-Lorence S. t Carmona C, Witney F., Lorence M.C. J. Biol. 

Chem. 267:14193-14203(1992). 

[ 3] Nagy I;, Schoofs G., Compernolle F., Proost P., Vanderleyden J., De Mot R. J. Bacteriol. 177:676-687(1995). 
[ 4] TaD.T., Vickery LE. J. Biol. Chem. 267:11120-11125(1992). 

| 5] Naud I., Vincon M., Garin J., Gaiilard J., Forest E., Jouanneau Y Eur. J. Biochem. 222:933-939(1994). 
4$ [ 6] Amemiya K EMBL/Genbank: X51607. 

[0622] 204. 4Fe-4S ferredoxins, iron-sulfur binding region signature (fer4) 

Ferredoxins [1] are a group of iron-sulfur proteins which mediate electron transfer in a wide variety of metabolic reac- r 
ttons. Ferredoxins can be divided into several subgroups depending upon the physiological nature of the iron-sulfur 

so cluster(s). One of these subgroups are the 4Fe-4S ferredoxins, which are found in bacteria and which are thus often 
referred as 'bacterial-type' ferredoxins. The structure of these proteins [2] consists of the duplication of a domain of 
twenty six amino acid residues; each of these domains contains four cysteine residues that bind to a 4Fe-4S center. 
Anumber of proteins have been found [3] that include one or more 4Fe-4Sbinding domains similar to those of bacterial- 
type lerredoxins. These proteins are listed below (references are only provided tor recently determined sequences). - 

55 [0623] The iron-sulfur proteins of the succinate dehydrogenase and the fumarate reductase complexes (EC 1 .3.99.1). 
These enzyme complexes, which are components of the tricarboxylic acid cycle, each contain three subunits: a flavo- 
protein, an iron-sulfur protein, and a b-type cytochrome. The iron- sulfur proteins contain three different iron-sulfur 
centers: a 2Fe-2S, a 3Fe-3S and a 4Fe-4S. - Escherichia coli anaerobic glycerol-3-phosphate dehydrogenase (EC 
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ThB r subunit seems to be an iron-sullur protein 
The C suDunu ^ imo ihwl sulfoxide 


10 


15 


20 


• .nmnosed ol three subunits: A. B, and C. ™ B °V*T£<x*\ ana erobic dimethyl sulfoxide 
1 1 99.5) This enzyme is composed oi in r ^ . Escherichia con an e dornains . 

^o'terredoxin-iike domains « he merm^ £ „ &n , on . sultur pro tein !^5«B and hycF) seem 

reductase. TheBsubunit of ^^^^ 

. Escherichia coli formate hydTO^s j 4Fe . 4S lerredoxin-like domains. - Me Jhe bela cnain ol. this 

^""^P^r^C^^ b * - ^^^fo^Sy" 9-- N and O (EC ULlii) 
dehydrogenase (EC 12±2>- ' s ters . . Escherichia coli formate oeny a )0 rrodoxin-like 

dimeric enzyme probab.y bin* » ^f^dnH and f doH) are irorvsul.ur J^^* ti!.* enzyme binds 

The beta chain ot these two ^iZ^Z^enase (EC Via9El). The large chain ol h Melhanobac . 

domains. - Oesullovibno P e ^ C a ^ a ? e d'n the lerredoxin-like N-ie rr«nal a ° s P sixl andem.y repeated 

thre e 4Fe-4S centers ™° 0 <^*^ anaerobic sul.ite 

protein which is V m6 '°™ZZ coli hypothetical protein y|)W. a P* " w h a " The patter n ol cysteine res- 
Entamobea histolytica. - Escherichia co. _nyp tentia | 4 Fe-4S centers. P 

[06241 Consensus pattern: C*W^ x W 


25 


30 


[0624] 

1 4] Huang C.J.. Barren tx. 13 . 460 . 4 61(1988 . 

I 51 Knatt D.B. Trends B.ochem. Sc.. 1 J.«> 


35 


40 


[06251 205. NilH/trxC lami.y ^esj^m bjotogical nrtr oaen ^^^^Z 

Senase (EC 1^) ™ ^£Sn^ component 1 which contains th J*^?*. 
oligomer* complex w "'ch consists oMwo co p ^ } t g , s a horn by r p 

ot'nitrogen to ammonia ^g^^^ |2). In tho ^J^^^c^ 
nilH) which binds a single 4Fo-4S i iron b , rans)ers electrons to component 1 w. Cnlorop | a st encoded IrxC 

s'ucn as terredoxin; the ^^££^or*y -iated to ni.H. ™~ JL ^ " ^ ^ 

ATP A number ol proteins are known » o _ me 0 , so me plant species, s Rh ^ obacte r capsulatus 

(Jrch.L) protein [3]. FrxC is P- tocW ° r °^ are 

but « could act asanelectroncarriennmeconv ^ ^ g role chlowhy ^he s)|e ^ . A . 

census pattern. E-x-G-G-P-x(2) faA * i „ ur cent er]- 


J.626] Consensus pattern: ^^^p , c b inds the iron-suHur center]- 

Consensus pattern: d-x-l-u 

inl1.2)is one ol ""glows rhe lorme. and assures i» so***Y « an ^ cl05e , (8la , M 
, nrf-uSuri! prole" sM ' To Ihars are gw.rally «•« ™« ?f™fj , ' Isle-fflio is KM* 


55 


l F er'«i J n H .2] is one o, 'ne major non ; n^ ^former and assures its ^^i^^ closely re.ated 
and a multi-subunit protein ^J^^S there are generally two or more ^ *JJ ^ 1err jtin is found in the 
a nima.s the protein ^^S^b which are known as H(eavy 0 and ^3 , hese regio ns to develop 
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ions can gain access to the central cavity of the molecule; this pattern also includes conserved acidic residues which 
are botential metal-binding sites. 

|0628] bonsonsus pattern: E-x-|KB]-E-x(2)-E-|KR]-|LF]-|LIVMAl-x(2)-Q-N-x-R-x-G-R [The 3 E s are potential iron 

lignntlGl- 

5 Consensus pattern: D-x(2)-[LI VMF]-[STAC]-[DH]-F-[LI]-lEN]-x(2HFY]-L-x(6)-[LI VM]-[KN] (The second D and th E are 
potential iron ligands]- 

[1] Crichton R.R., Charloteaux-Wauters M. Eur. J. Biochem. 164:485-506(1987). 
| 2] Theil E.C. Annu. Rev. Biochem. 56:289-315(1987). 
to I 3) Ragland M., Briat J.-F., Gagnon J., Laulhoro J.-R, Massonol O., Thoit E.C. J. Biol. Chem. 265:18339-18344 

(1990). 

[0629] 207. Intermediate filaments signature (filament) 

Intermediate filaments (IF) [1,2,3] are proteins which are primordial components of the cytoskeleton and the nuclear 
is envelope. They generally form filamentous structures 8 to 1 4 nm wide. IF proteins are members of a very large multigene 
family of proteins which has boon subdivided in tivo major subgroups: - Type I: Acidic cytokeratins. - Type II: Basic 
cytokeratins. - Type III:- Vimentin, desmin, glial fibrillary acidic protein (GFAP), peripheric and plasticin. - Type IV: 
Nourofilamonts L, H and M ( alpha-intornoxin and nostin. - Type V: Nuclear lamins A, B1, B2 and C. All IF proteins are 
structurally similar in that they consist ol: a central rod domain comprising some 300 to 350 residues which is arranged 
20 in coiled-coiled alpha-helices, with at least two short characteristic interruptions; a N-terminal non-helical domain (head) 
of variable length; and a C-terminal domain (tail) which is also non-helical, and which shows extreme length variation 
between different IF proteins. While IF proteins are evolutionary and structurally related, they have limited sequence 
homologies except in several regions of the rod domain. A conserved region at the C-terminal extremity of the rod 
domain was usod as a soquonco pattern for this class of proteins. 
26 [0630] Consensus paltorn: [IVJ-x-[TACI]-Y-[RKH]-x-[LM]-L-[DE]- 

[ 1] Ouinlan R. t Hutchison C. t Lane B. Protein Prof. 2:801-952(1995). 
[ 2] Sleiner P.M., Roop D.R. Annu. Rev. Biochem. 57:593-625(1988). 
[ 3] Stewart M. Curr. Opin. Cell Biol. 2:91-100(1990). 

30 

[0631] 208. Flavodoxin signature 

Flavodoxins [1 ,£1] are electron-transfer proteins that function in various electron transport systems. Flavodoxins bind 
ono FMN moloculo, which servos as a redox-activo prosthetic group. Flavodoxins are functionally interchangeable with 
ferredoxins. They have been isolated from prokaryotes, cyanobacteria, and some eukaryotic algae. The signature 
35 pattern for these proteins is derived from a conserved region in their N-terminal section, this region is involved in the 
binding of the FMN phosphate group. 

[0632] Consensus pattern: [LIV]-[UVFY]-[FY]-x-[ST]-x(2)-[AGC]-x-T-x(3)-A-x(2)-[LIV]- 
[ 1] Wakabayashi S., Kimura K., Matsubara H., Rogers L.J. Biochem. J. 263:981-984(1989). 
[0633] 209. Growth factor and cytokines receptors family signatures (fn3) 

40 a number of receptors for lymphokines, hematopoeitic growth factors and growth hormone-related molecules have 
been found [1 to 5] to share a common binding domain. Receptors known to belong to this family are: - Cytokine 
receptor common beta chain. This chain is common to the IL-3, IL-5 and GM-CSF receptors. - Cytokine receptor 
common gamma chain. This chain is common to the IL-2, IL-4, IL-7 and IL-13 receptors. - Ciliary neurotroph.c factor 
• receptor (CNTFR) - Erythropoietin receptor (EPOR). - Granulocyte colony-stimulating factor receptor (G-CSFR). - 

45 Granulocyte-macrophage colony-stimulating factor receptor alpha chain (GM- CSFR). - lnterleukin-2 receptor beta 
chain (IL2R-beta). - lnterleukin-3 receptor alpha chain (IL3R). - lnterleukin-4 receptor alpha chain (IL4R). - Interleukm- 
5 receptor alpha chain (IL5R). - lnterleukin-6 receptor (IL6R). - lnterleukin-7 receptor alpha chain (IL7R). - Interleukm- 
9 receptor (IL9R). - Growth hormone receptor (GRHR). - Prolactin receptor (PRLR). -Thrombopoeitin receptor (TPOR). 
The conserved region constitutes all or part of the extracellular ligand-binding region and is about 200 amino acid 

oo (tmiduos long In Iho N-terminal of this domain thoro are two pairs ol cysteines known, in the growth hormone receptor, 
to be involved in disulfide bonds. + XXXXXXX + I C C C C Extracel- 
lular XXXXXXX Cytoplasmic I+-I-I M XXXXXXX +IH1 Transmembrane +- 

+ +..+ Two patterns to detect this family of receptors were used. The first one is derived from the first N-terminal disulfide 
loop the second is a tryptophan-rich pattern located at the C-terminal extremity of the extracellular region. 

55 [0634] Consensus pattern: C-[LVFYR)-x(7,8)-[STI VDN]-C-x-W [The two C's are linked by a disulfide bond]- 
Consensus pattern: [STGL]-x-W-[SG]-x-W-S- 

[ 1) Bazan J.F Biochem. Biophys. Res. Commun. 164:788-795(1989). 
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[ 2] Bazan J.F. Proc. Nail. Acad. Sci. U.S.A. 87:6934-6938(1990). 

[ 3] Cosman D., Lyman S.D., Idzerda RL. t Beckmann M.P., Park L.S., Goodwin R.G., March C.J. Trends Biochem. 
Sci. 15:265-270(1990). . . 

f41 d' Andrea A.D.. Fasman G.D.. Lodish H.F. Cell 58:1023-1024(1989). 
5 [ 5] d'Andrea A.D., Fasman G.D., Lodish H.F. Curr. Opin. Cell Biol. 2:648-651(1990). 

[0635] 210. Phosphoribosylglycinamide formyltransf erase active site (formyl_transf) 

Phosphoribosylglycinamide tormyltransferase (EC 2.1.2.2 ) (GART) [1] catalyzes the third step in de novo purine bio- 
synthesis, the transfer ol a formyl group lo 5'-phosphoribosylglycinamide. In higher eukaryotes, GART is part of a 

10 multifunctional enzyme polypeptide that catalyzes three of the steps of purine biosynthesis. In bacteria, plants and 
yeast, GART is a monofunctional protein of about 200 amino-acid residues. In the Escherichia coli enzyme, an aspartic 
acid residue has been shown to be involved in the catalytic mechanism. The region around this active site residue is 
well conserved in GART from prokaryotic and eukaryotic sources and can be used as a signature pattern. Mammalian 
formyltetrahydrofolate dehydrogenase (EC 1.5.1.6 ) [2] is a cytosolicenzyme responsible for the NADP-dependent de- 

*5 carboxylative reduction of 10-formyltetrahydrofolate into tetrahydrofolate. It is a protein of about 900 amino acids con- 
sisting of three domains; the N-terminal domain (200 residues) is structurally related to GARTs. Escherichia coji me- 
thionyl-tRNA tormyltransferase (EC 2.1.2.9 ) (gene fmt) [3]is the enzyme responsible for modifying the free amino group 
of the aminoacyimoiety of methionyl-A(fMet). The central part of fmt seems to be evolutionary related to GART's active 
site region. 

20 [0636] Consensus pattern: G-x-[STM]-[IVT]-x-(FYWVQ]-[VMAT]-x-[DEVM]-x-[LIVMY]-D-x-G- x(2)-[LIVT]-x(6)- 
[LIVM] [D is the active site residue] - 

[ 1] Inglese J., Smith J.M., Benkovic S.J. Biochemistry 29:6678-6687(1990). 
[ 2] Cook R.J., Lloyd R.S., Wagner C. J. Biol, Chem. 266:4965-4973(1991). 
2S [ 3] Guillon J.-M., Mechulam Y, Schmitter J.-M., Blanquet S., Fayat G. J. Bacteriol. 174:4294-4301(1992). 

[0637] 211, G10 protein signatures 

A Xenopus protein known as G10 [1] has been found to be highly conserved in a wide range of eukaryotic species. 
The function of G10 is still unknown. G10 is a protein of about 17 to 18 Kd (143 to 157 residues) which is hydrophilic 
30 and whose C-terminal half is rich in cysteines and could be involved in metal-binding. As signature patterns, two of 
these cysteine-rich segments were selected. 

[0638] Consensus pattern: L-C-C-x-[KR]-C-x(4)-(DE]-x-N-x(4)-C-x-C-R-V-P- 
Consensus pattern: C-x-H-C-G-C-[KRH]-G-C-[SA]- 

[0639] [ 1] McGrew L.L., Dworkin-Rastl E., Dworkin M.B., Richter J.D. Genes Dev. 3:803-815(1989). 
35 [0640] 212. G-protein alpha subunit 

[0641] G proteins couple receptors of extracellular signals to intracellular signaling pathways. The G protein alpha 
subunit binds guanyl nucleotide and is a weak GTPase. Number of members: 195 

[1] Coleman DE, Berghuis AM, Lee E, Linder ME, Gilman AG, Sprang SR, Science 1994;265:1405-1412. 
to [2] How G proteins work: a continuing story. Coleman DE, Sprang SR, Trends Biochem Sci 1996;21:41-44. 

[0642] 213. Glucose-6-phosphate dehydrogenase active site (G6PD) 

Glucose-6-phosphate dehydrogenase (EC 1.1.1.49 ) (G6PD) [1] catalyzes the first stop in tho pontoso pathway, tho 
reduction of glucose-6-phosphate to gluconolactone 6-phosphate. A lysine residue has been identified as are active 
45 nucleophiie associated with the activity of the enzyme. The sequence around this lysine is totally conserved from 
bacterial to mammalian G6PD*s and can be used as a signature pattern 
[0643] Consensus pattern: D-H-Y-L-G-K-[EGK] [K is the active site residue]- 

[0644] [ 1] Jeffery J., Persson B., Wood I., Bergman T, Jeffery R., Joernvall H Eur. J. Biochem. 212:41-49(1993). 
[0645] 214. GATA-type zinc finger domain 
50 The GATA family of transcription factors are proteins that bind to DNA sites with the consensus sequence (A/TJGATA 
(A/G), found within the regulatory region of a number of genes. Proteins currently known to belong to this family are: 

- GATA-1 [1] (also known as Eryfl , GF-1 or NF-E1), which binds to the GATA region of globin genes and other genes 
expressed in erythroid cells. It is a transcriptional activator which probably serves as a general 'switch* factor for eryth- 
roid development. - GATA-2 [2], a transcriptional activator which regulates endothelin-1 gene expression in endothelial 

55 cells. - GATA- 3 [3], a transcriptional activator which binds to the enhancer of the T-cell rec ptor alpha and delta genes. 

- GATA-4 [4], a transcriptional activator expressed in endodermally derived tissues and heart. - Drosophila pro! in 
pannier (or DGATAa) (gene pnr) which acts as a repressor of the achaete-scute complex (as-c). - Bombyx mori BCFI 
[5], which regulates the expression of chorion genes. - Caenorhabditis elegans elt-1 and elt-2, transcriptional activators 


104 


EP 1 033 405 A2 

of genes containing the GATA region, including vitellogenin genes [6]. • Ustitago maydis urbsl 17], a protein involved 
in the roprossion of the biosynthesis of siderophores. - Fission yeast protein GAF2. All these transcription factors contain 
a pair of highly similar 'zinc finger* type domains with the consensus sequence C-x2-C-x1 7-C-x2-C.Some other proteins 
contain a single zinc finger motif highly related to those of the GATA transcription factors. These proteins are: - Dro- 

5 eophila box A-binding factor ( ABF) (also known as protein serpent (gene srp)) which may f unctton as a transcriptional 
activator protein and may play a key role in the organogenesis of the fat body. - Emericella nidulans areA [8], a tran- 
scriptional activator which mediates nitrogen metabolite repression. - Neurospora crassa nit-2 [9J, a transcriptional 
activator which turns on the expression of gonos coding for enzymes required for the use of a variety of secondary 
nitrogen sources, during conditions of nitrogen limitation. - Neurospora crassa white collar proteins 1 and 2 (WC-1 and 

10 WC-2), which control expression of light-regulated genes. - Saccharomyces cerevisiae DAL81 (or UG A43), a negative 
nitrogen regulatory protein. - Saccharomyces cerevisiae GLN3, a positive nitrogen regulatory protein. - Saccharomyces 
cerevisiae GAT1. - Saccharomyces cerevisiae G2F3. 

[0646] Consonsus pattern: C-x-[DN]-C-x(4,5)-[ST].x(2)-W-[HR]-[RK]-x(3)-[GN]-x(3,4)- C-N-[AS]-C [The four C's are 
zinc ligands] 

75 

| 1)Tminor CD., Evans T, Folsonfold G., Boguski M.S. Nature 343:92-96(1990). 
[ 2) Lee ME., Temlzer D.T., Ctitlord J.A., Ouerlermous T. J. Biol. Chem. 266:16188-16192(1991). 
[ 3] Ho l.-C Vorhees P., Marin N. f Oakley B.K., Tsai S.-F.. Orkin S.H., Leiden J.M. EMBO J. 10:1187-1192(1991). 
[ 4] Spiolh J., Shim Y.H., Lea K., Conrad R., Blumenthal T. Mol. Cell. Biol. 11:4651-4659(1991). 
20 [ 5] Drevet J.R., Skeiky Y.A., latrou K. J. Biol. Chem. 269:10660-10667(1994). 

[ 6] Hawkins M.G., McGhee J.D. J. Biol. Chem. 270:14666 -14671(1995). 

| 7] Voisard C.P.O., Wang J., Xu P., Leong S.A., McEvoy J.L. Mol. Cell. Biol. 13:7091-7100(1993). 

[ 8] Arsl H.N. Jr., Kudla B., Martinez-Rossi N.M., Caddick M.X., Sibley S., Davios R.W. Trends Genet. 5:291-291 

(1989). 

25 [ 9] Fu Y.-H. t Marzluf G.A. Mol. Cell. Biol. 10:1056-1065(1990). 

[0647] 215. Glulamine amidotransf erases ctass-l active site (GATase) 

A largo group of biosynthetic enzymes are able to catalyze the removal of the ammonia group from glutamine and then 
to transfer this group to a substrate to form a new carbon-nitrogen group. This catalytic activity is known asglutamme 

30 amidotransf erase (GATase) (EC 2.4.2.-) [1]. The GATase domain exists either as a separate polypeptide subunit or 
as part of a larger polypeptide fused in different ways to a synthase domain. On the basis of sequence similarities two 
classes of GATase domains have been identified [2,3]: class-l(also known as trpG-type) and class-ll (also known as 
purF-typo). Class-I GATase domains have been found in the following enzymes: - The second component of anthra- 
nilnlo uynllwioo (AS) (EC 4. 1.3.27 ) [A]. AS calnly/oo tho bioDynlhods ol anlhrnnilnlo from chorismate and glutamine. 

35 AS is generally a dimeric enzyme: the first component can synthesize anthranilate using ammonia rather than 
glutamine. whereas component II provides the GATase activity. In some bacteria and in fungi the GATase component 
of AS is part of a multifunctional protein that also catalyzes other steps of the biosynthesis of tryptophan. - The second 
component of 4-amino-4-deoxychorismate (ADC) synthase (EC 4.1.3. -), a dimeric prokaryotic enzyme that function 
in tho pathway that catalyzes the biosynthesis of para-aminobenzoate (PABA) Irom chorismate and glutamine. The 

40 second component (gene pabA) provides the GATase activity [4]. - CTP synthase (EC 6.3.4.2). CTP synthase catalyzes 
the final reaction in the biosynthesis of pyrimidine, the ATP-dependent formation of CTP from UTP and glutamine. CTP 
synthase is a single chain enzyme that contains two distinct domains; the GATase domain is in the C-terminal section 
[2] - GMP synthase (glutamine-hydrolyzing) (EC 6.3.5.2 ). GMP synthase catalyzes the ATP-dependent formation of 
GMP from xanthosine 5'-phosphate and glutamine. GMP synthase is a single chain enzyme that contains two distinct 

4S domains; the GATase domain is in the N-terminal section [5]. - Glutamine-dependent carbamoyt-phosphate synthase 
(EC 6.3.5.5 ) (GD-CPSase); an enzyme involved in both arginine and pyrimidine biosynthesis and which catalyzes the 
ATP-dependent formation of carbamoyl phosphate from glutamine and carbon dioxide. In bacteria GD-CPSase is com- 
posed of two subunits: the large chain (gene carB) provides the CPSase activity, while the small chain (gene carA) 
provides the GATase activity. In yeast the enzyme involved in arginine biosynthesis is also composed of two subunits: 

50 CPA1 (GATase), and CPA2 (CPSase). In most eukaryotes, the first three steps of pyrimidine biosynthesis are catalyzed 
by a largo multifunctional enzyme (called URA2 in yeast, rudimentary in Drosophila, and CAD in mammals). The GA- 
Tase domain is located at the N-terminal extremity of this polyprotein [6]. - 

Phosphoribosylformylgiycinamidine synthase li (EC 6.3.5.3 ), an enzyme that catalyzes the fourth step in the de novo 
biosynthesis of purines. In some species of bacteria, FGAM synthase II is composed of two subunits: a small chain 
55 (gene purQ) which provides the GATase activity and a large chain (gene purL) which provides the aminator activity. - 
The histidine amidotransf erase hisH, an enzyme that catalyzes the fifth step in the biosynthesis of histidine in prokary- 
otes.ln the second component of AS a cysteine has been shown [7] to be essentialtor th amidotransf erase activity. 
The sequence around this residue is well conserved in all the above GATase domains and can be used as a signatur 
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pattern for class-l GATase. - 

[0648] Consensus pattern: [PAS]-[LI VMFYT]-[LI VMFY]-G-[LIVMFY]-C-[LIVMFYN]-G-x-[QEH]- x-[LIVMFA] [C is the 
active site residue]- 

s [ 1] Buchanan J.M. Adv. Enzymol. 39:91-183(1973). 

[ 2] Weng M. ( Zalkin H. J. Bacteriol. 169:3023-3028(1987). 

[ 3] Nyunoya H., Lusty C.J. J. Biol. Chem. 259:9790-9798(1984). 

[ 4) Crawford I.R Annu. Rev. Microbiol. 43:567-600(1989). 

[ 5] Zalkin K, Argos P., Narayana S.V.L., Tiedeman A.A., Smith J.M. J. BioL Chem. 260:3350-3354(1985)., 
io [ 6] Davidson J.N., Chen K.C., Jamison R.S., Musmanno LA., Kern C.B. BioEssays 15:157-164(1993). 

[ 7] Tso J.Y., Hermodson M.A.. Zalkin H. J.= Biol. Chem. 255:1451-1457(1980). 

[0649] 216. Glutamine amidotransferases class-ll active site (GATase_2) 

A large group of biosynthetic enzymes are able to catalyze the removal of the ammonia group from glutamine and then 
*5 to transfer this group to a substrate to form a new carbon-nitrogen group. This catalytic activity is known as glutamine 
amidotransferase (GATase) (EC 2.4.2.-) [1). The GATase domain exists either as a separate polypeptidic subunit or 
as part of a larger polypeptide fused in different ways to a synthase domain. On the basis of sequence similarities two 
classes of GATase domains have been identified [2,3]: class-l (also known as IrpG-type) and class-ll (also known as 
purF-type). Class-ll GATase domains have been found in the following enzymes: - Amido phosphoribosyltransferase 
20 (glutamine phosphoribosylpyrophosphate amidotransferase) (EC 2.4.2.14 ). An enzyme which catalyzes the first step 
in purine biosynthesis, the transfer of the ammonia group of glutamine to PRPP to form 5-phosphoribosy (amino (gone 
purF in bacteria, ADE4 in yeast). - Glucosamine--fructose-6-phosphate aminotransferase (EC 2.6.1 .16 ). This enzyme 
catalyzes a key reaction in amino sugar synthesis, the formation of glucosamine 6-phosphate from fructose 6-phos- 
phate and glutamine (gene glmS in Escherichia coli, nodM in Rhizobium, GFA1 in yeast) - Asparagine synthetase 
25 (glutamine-hydrolyzing) (EC 6.3.5.4 ). This enzyme is responsible for the synthesis of asparagine from aspartate and 
glutamine. A cysteine is present at the N-terminal extremity of the mature form of all these enzymes. The cysteine has 
been shown, in amido phosphoribosyltransferase [4] and in asparagine synthetase [5] to be important lor the catalytic 
mechanism. 

[0650] Consensus pattern: <x(0,11)-C-[GS]-[IV]-[LIVMFYW]-[AG) [C is the active site residue]- 

30 

[ 1] Buchanan J.M. Adv. Enzymol. 39:91-183(1973). 
[ 2] Weng M., Zalkin H. J. Bacteriol. 169:3023-3028(1987). 
[ 3] Nyunoya H., Lusty C.J. J. Biol. Chem. 259:9790-9798(1984). 
[ 4] van Heeke G., Schuster M. J. BioL Chem. 264:5503-5509(1989). 
35 [5] Vollmer S.J., Switzer R.L., Hermodson M.A., Bower S.G., Zalkin H. J. Biol. Chem. 258:10582-10585(1983). 

[0651] 217. GDP dissociation inhibitor (GDI) 

[0652] [1] Schalk I, Zeng K, Wu SK, Stura EA, Matteson J. Huang M, Tandon A, Wilson IA, Balch WE, Nature 1996; 
381:42-48. 

40 [0653] 218. Oxidoreductase family (GFOJDH_MocA) 

[0654] This family of enzymes utilise NADP or NAD. This family: is called the GFO/IDH/MOCA family in swiss-prot. 
[0655] [1] Kingston RL, Scopes RK, Baker EN, Structure 1996;4:1413-1428. 
[0656] 219. GHMP kinases putative ATP-binding domain 

The following kinases contains, in their N-terminal section, a consorvod Gly/Ser-rich region which is probably involved 
45 in the binding of ATP [1]. These kinases are listed below. - Galactokinase (EC 2.7.1.6 ). - Homoserine kinase (EC 
2.7.1.39 ). - Mevalonate kinase (EC 2.7.1.36 ). - Phosphomevalonate kinase (EC 2.7.4.2 ). This group of kinases was 
called 'GHMP' (from the first letter of their substrate) 

Consensus pattern: [LI VM]-[PK]-x-[GSTA]-x(0,1 )-G-L-[GS]-S-S-[GSA]-[GSTAC]- 
[0657] [1] Tsay Y.H., Robinson G.W. Mol. Cell. Biol. 11:620-631(1991). 

50 [0658] 220. Glucose inhibited division protein A family signatures (GIDA) 

Bacterial glucose inhibited division protein A (gene gidA) is a protein of 70Kd whose function is not yet known and 
whose sequence is highly conserved. It is evolutionary related to yeast hypothetical protein YGL236C, Caonorhabditis 
elegans hypothetical protein F52H3.2 and a Bacillus subtilis protein called gid (and which is different from B.subtilis 
gidA). Two highly conserved regions were selected as signature patterns. Both regions are located in the central region 

55 of the protein. 

[0659] Consensus pattern: IGS]-[PT]-x-Y-C-P-S-[LIVM]-E-x-K-[LIVM]-x-[KR]- I 
Consensus pattern: A-G-Q-x-[NT]-G-x(2)-G-Y-x-E-[SAG](3)-[QS]-G-[LIVM](2)-A-G-[LIVMT]-N-A- 
[0660] 221. (GLFV.dehydrog) 
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Glu / Leu / Phe / Val dehydrogenases active site 

- Glutamate dehydrogenases (EC 1.4. 1.2. EC 1.4.1.3. and EC 1 .4.1.4). (GluDH) are enzymes that catalyze the NAD- 
or NADP-dopondont rovorsiblo doarninalion of glutamalo into alpha-ketoglutarate [1 ,2]. GluDH isozymes are gen- 

t. umlly Involvnd Willi olthoi iiiuMiomIm nnnltntlfttlon or filulnmiiln cntnbollBrn. 

- Loucino dohydrogonaso (EC 1.4.1.9) (LouDH) Is a NAD-dopondonl onzymo that catalyzos tho rovorsiblo doaml- 
nation of leucine and several other aliphatic amino acids to their keto analogues [3]. 

- Phenylalanine dehydrogenase (EC 1.4.1.20) (PheDH) is a NAD-dependent enzyme that catalyzes the reversible 
deamidation of L-phenylalanine into phenylpyruvate [4]. 

w • Vf.lino dohydrononaso (EC 1.4.1.8) (VaIDH) is a NADP-dopondGnt onzymo that catalyzes the reversible deami- 
dation of L-vfilino into 3-mothyl-2-oxobulanoato [5J. 

[0661] These dehydrogenases are structurally and functionally related. A conserved lysine residue located in a gly- 
cine-rich region has been implicated in the catalytic mechanism. The conservation of the region around this residue 
is allows the derivation of a signature pattern for such type of enzymes. 

[0662] Consensus paHem|LIV]-x(2).G-G-|SAG]-K-x-[GVJ-x(3)-|DNSTHPL] [K is the active site residue] Sequences 
known to belong to thlo cImqq dotoclod by tho pnttorn ALL. 

[0663] Note all known sequences from this family have Pro in the last position of the pattern with the exception of 
yoast GluDH which as Leu. 

20 

[ 1] Britton K.L., Baker P.J., Rice D.W., Stillman T.J. Eur. J. Biochem. 209:851-859(1992). 
| 2] Bonachonhou-Lahla N. ( Fortorro P., Labodan B. J. MoL Evol. 36:335-346(1993). 

[ 3] Nagata S.. Tanizawa K., Esakl N., Sakamoto Y., Ohshima T, Tanaka H. ( Soda K. Biochemistry 27:9056-9062 
(1988). 

25 I A) Takada H.. Yoshimura T, Ohshima T., Esaki N., Soda K. J. Biochem. 109:371-376(1991). 

[ 5] Hutchinson C.R., Tang L. J. Bacteriol. 175:4176-4185(1993). 

[0664] 222. GMC oxidoreductases signatures 

Tho following FAD flavoprotolns oxidoroductasos have boon found [1.2] to bo evolutionary related. These enzymes, 

30 which are called 'GMC oxidoreductases', are listed below. - Glucose oxidase (EC 1.1.3.4) (GOX) from Aspergillus niger. 
Reaction catalyzed: glucose + oxygen -> delta-gluconolactone + hydrogen peroxide. - Methanol oxidase (EC 1.1.3.13) 
(MOX) from fungi. Reaction catalyzed: methanol + oxygen -> acetaldehyde + hydrogen peroxide. - Choline dehydro- 
genase (EC 1.1.99.1 ) (CHD) from bacteria. Reaction catalyzed: choline + unknown acceptor -> betaine acetaldehyde 
+ reduced accoptor. - Glucose dehydrogenase (GLD) (EC 1.1.99.10 ) from Drosophila. Reaction catalyzed: glucose + 

35 unknown acceptor -> delta-gluconolactone + reduced acceptor. - Cholesterol oxidase (CHOD) (EC 1.1.3.6) from Brevi- 
bacterium sterolicum and Streptomyces strain SA-COO. Reaction catalyzed: cholesterol + oxygen -> cholest-4-en- 
3-one + hydrogen peroxide. - AlkJ [3], an alcohol dehydrogenase from Pseudomonas oleovorans, which converts 
aliphatic medium-chain-length alcohols into aldehydes. This family also includes a lyase: - (R)-mandelonitrile lyase 
(EC 4.1.2.10) (hydroxynitrile lyase) from plants [4], an enzyme involved in cyanogenis, the release of hydrogen cyanide 

40 from injured tissues. These enzymes are proteins of size ranging from 556 (CHD) to 664 (MOX) amino acid residues 
which share a number of regions of sequence similarities. One of these regions, located in the N-terminal section, 
corresponds to the FAD ADP-binding domain. The function of the other conserved domains is not yet known; two of 
these domains were selected as signature patterns. The first one is located in the N-terminal section of these enzymes, 
about 50 residues after the ADP-binding domain, while the second one is located in the central section. 

as [0665] Consensus pattern: [GA]-[RKN]-x-[LIV]-G(2)-[GST](2)-x-[LIVM]-N-x(3)-[FYWA]- x(2)-[PAG]-x(5)-[DNESH]- 
Consensus pattern: [GS]-[PSTA]-x(2)-[ST]-P-x-[LIVM](2)-x(2)-S-G-[LIVM]-G- 

[ 1] Cavener D.R. J. Mol. Biol. 223:811-814(1992). 
[ 2] Henikoff S., Henikoff J.G. Genomics 19:97-107(1994). 
so i 3] van Beilen J.B.. Eggink G., Enequist H, Bos R., Witholt B. Mol. Microbiol. 6:3121-3136(1992). 

[ 4] Cheng LP., Poulton J.E. Plant Cell Physiol. 34:1139-1143(1993), 

[0666] 223. (GMP„synt_C) 
Glutamine amidotransferases class-l active site 
55 [0667] A large group of biosynthetic enzymes are able to catalyze the removal of the ammonia group from glutamine 
and then to transfer this group to a substrate to form a new carbon-nitrogen group. This catalytic activity is known as 
glutamino amidotranstoraso (GATase) (EC 2.4.2.-) [1]. The GATase domain exists ither as a separate- polypeptide 
subunit or as part of a larger polypeptide fused in different ways to a synthase domain. On the basis of sequence 
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similarities two classes of GATase domains have been identified [2,3]: class-l (also known as trpG-type) and class-ll 
(also known as purF-type). Class-! GATase domains have been found in the following enzymes: 

The second component of anthranilate synthase (AS) (EC 4.1 .3.27) [4]. AS catalyzes the biosynthesis of anthra- 
5 nilate from chorismate and glutamine. AS is generally a dimeric enzyme: the first component can synthesize an- 

thranilate using ammonia rather than glutamine, whereas component II provides the GATase activity. In some 
bacteria and in fungi the GATase component of AS is part of a multifunctional protein that also catalyzes other 
steps of the biosynthesis of tryptophan. 

The second component of 4-amino-4-deoxychorismate (ADC) synthase (EC 4.1.3. -), a dimeric prokaryotic enzyme 
io that function in the pathway that catalyzes the biosynthesis of para-aminobenzoate (PABA) from chorismate and 

glutamine. The second component (gene pabA) provides the GATase activity [4]. 

CTP synthase (EC 6.3.4.2). CTP synthase catalyzes the final reaction in the biosynthesis of pyrimidine, the ATP- 
dependent formation of CTP from UTP and glutamine. CTP synthase is a single chain enzyme that contains two 
distinct domains; the GATase domain is in the C-terminal section [2]. 
75 - GMP synthase (glutamine-hydroiyzing) (EC 6.3.5.2). GMP synthase catalyzes the ATP-dependent formation of 
GMP from xanthosine 5'-phosphate and glutamine. GMP synthase is a single chain enzyme that contains two 
distinct domains; the GATase domain is in the N-terminal section [5]. 

Glutamine-dependent carbamoyl-phosphate synthase (EC 6.3.5.5) (GD-CPSase); an enzyme involved in both 
arginine and pyrimidine biosynthesis and which catalyzes the ATP-dependent formation ol carbamoyl phosphate 

20 from glutamine and carbon dioxide. In bacteria GD-CPSase is composed of two subunits: the large chain (gene 

carB) provides the CPSase activity, while the small chain (gene carA) provides the GATase activity. In yeast the 
enzyme involved in arginine biosynthesis is also composed of two subunits: CPA1 (GATase), and CPA2 (CPSase). 
In most eukaryotes, the first three steps of pyrimidine biosynthesis are catalyzed by a large multifunctional enzyme 
(called URA2 in yeast, rudimentary in Drosbphila, and CAD in mammals). The GATase domain is located at the 

25 N-terminal extremity of this polyprotein [6]. 

Phosphoribosylformylglycinamidine synthase II (EC 6.3.5.3), an enzyme that catalyzes the fourth stop In the de 
novo biosynthesis of purines. In some species of bacteria, FGAM synthase II is composed of two subunits: a small 
chain (gene purQ) which provides the GATase activity and a large chain (gene purL) which provides the aminator 
activity. 

30 - The histidine amidotransferase hisH, an enzyme that catalyzes the fifth step in the biosynthesis of histidine in 
prokaryotes. . 

[0668] In the second component of AS a cysteine has been shown [7] to be essentia! for the amidotransferase activity. 
The sequence around this residue is well conserved in all the above GATase domains and can be used as a signature 
35 pattern for class-l GATase. 

[0669] Consensus pattern[PAS]-[LIVMFYT]-[LIVMFY)-G-[LIVMFY]-C-[LIVMFYN]-G-x-[QEH]- x-[LIVMFA] [C is the 
active site residue] Sequences known to belong to this class detected by the pattern ALL, oxcopt for 6 sequences. 
[0670] Note: in the first position of the pattern Pro is found in all cases except in the slime mold GD-CPSase where 
it is replaced by Ala. 

40 

I 1] Buchanan J^M. Adv. Enzymol. 39:91-183(1973). 
[ 2] Weng M. ( Zalkin H. J. Bacteriol. 169:3023-3028(1987). 
[ 3] Nyunoya H., Lusty C.J. J. Biol. Chem. 259:9790-9798(1984). 
[ 4] Crawford I. P. Annu. Rev. Microbiol. 43:567-600(1989). 
45 [ 5] Zalkin K, Argos P., Narayana S.V.L., Tiedeman A.A., Smith J.M. J. Biol. Chem. 260:3350-3354(1985). 

[ 6] Davidson J.N., Chen K.C., Jamison R.S., Musmanno LA., Kern C.B. BioEssays 15:157-164(1993). 
[ 7] Tso J.Y., Hermodson M.A., Zalkin H. J. Biol. Chem. 255:1451-1457(1980). 

[0671] 224. Glutathione poroxidasos signatures (GSHPx) 

50 Glutathione peroxidase (EC 1.11.1.9 ) (GSHPx) [1,2] is an enzyme that catalyzes the reduction of hydroxyperoxides 
by glutathione. Its main function is to protect against the damaging effect of endogenously formed hydroxyperoxides. 
In higher vertebrates at least four forms of GSHPx are known to exist: a ubiquitous cytosoltc form (GSHPx-1), a gas- 
trointestinal cytosolic for (GSHPx-GI) [3], a plasma secreted form (GSHPx-P) [4], and a epididymal secretory form 
(GSHPx-EP). In addition to these characterized forms, tho sequence of a protein of unknown function |5) has boon 

55 shown to be evolutionary related to those of GSHPx's. In filarial nematode parasites 6uch as Brugia pahangl tho major 
soluble cuticular prot in, known as gp29, is a secreted GSHPx which could provide a mechanism of resistance td the 
immune reaction of the mammalian host by neutralizing the products of the oxidative burst of leukocytes [6].Ese he rich ia 
coli protein btuE, a p riplasmic protein involved in the transport of vitamin B12, is also evolutionary related to GSHPx's; 
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the significance of this relationship is not yet clear Selenium, in the form of selenocysteine [7] is part of the catalytic 
site 'of GSHPx. The sequonce around the selenocysteine residue is moderately well conserved in GSHPx's and the 
related proteins and can be used as a signature pattern. As a second signature for this family of proteins a highly 
conoorvod ocl.'ipoplido locnlod in Iho control 6oclion of thoso proteins was soloctod. 
5 [0672] Consensus patlorn: |GN]-[RKHNFYC]-x-[LI VMFC]-[LI VMF](2)-x-N-[VT]-x-[STC],x-C-[GA]-x-T [C is the active 
site selenocysteine residue] 
Consonsus patlorn: [LIV]-|AGD]-F-P-|CSHNG]-Q- 

[ 1] Mannervik B. Meth. Enzymol. 113:490-495(1985). 
to [ 2] Mullonbach G.T., Tabrizi A., Irvine B.D., Boll G.I.. Tainor J.A., Hallewell R.A. Protein Eng. 2:239-246(1988). 

( 3] Chu F.F., Doroshow J.H., Esworthy R.S. J. Biol. Chem. 268:2571-2576(1993). 

[ 4] Takahashi K., Akasaka M., Yamamoto Y., Kobayashi C. t Mizoguchi J., Koyama J. J. Biochem. 108:145-148 
(1990). 

[ 5) Dunn D.K., Howells D.D., Richardson J., Goldfarb P.S. Nucleic Acids Res. 17:6390-6390(1989). 
t& | 6] Cookson E., Blaxter M.L., Selkirk M.E. Proc. Natl, Acad. Sci. U.S.A. 89:5837-5841(1992). 

[ 7J Qlwdlrruin T.C. Annu. Rov. Blochom. 59:111-127(1990). 

[0673] 225. (GST) 
Glutathione S-transferases 

20 [0674] Function: conjugation of reduced glutathione to a variety of targets. Also included in the alignment, but are 
not GSTs S-crystalline from squid. Similarity to GST was previously noted. Eukaryotic elongation factors 1 -gamma. 
Not known to have GST activity; similarity not previously recognized. Supported by HMM and manual alignment in- 
spection. HSP26 family of stress-related proteins, including auxin-regulated proteins in plants and stringent starvation 
proteins in E. coli. Not known to have GST activity. Similarity not previously recognized. Supported by HMM and manual 

25 alignment inspection. Alignment spans entire protein. 
[0675] 226. GTP1/OBG family signature 

A widosproHd lumily of GTP-binding proteins has been recently characterized [1,2]. This family currently includes: - 
Mouso and Xonopus proloin DRG. - Human protein DRG2. - Drosophila protein 128up. - Fission yeast protein gtpl. - 
A Halobacterium cutirubrum hypothetical protein in a ribosomal protein gene cluster. - Bacillus subtilis protein obg. 

30 Obg has been experimentally shown to bind GTR - Escherichia coli hypothetical protein yhbZ. - Haemophilus influenzae 
hypothetical protein HI0877. - Mycoplasma genitalium hypothetical protein MG384. - Yeast hypothetical protein 
YAL036c (FUN11 ). - Yeast hypothetical protein YGR173w. - Caenorhabditis elegans hypothetical protein C02F5.3.The 
function of the proteins that belong to this family is not yet known. They are polypeptides of about 40 to 48 Kd which 
contain the five small sequence elements characteristic of GTP-binding proteins [3]. As a signature pattern the region 

35 that correspond to the ATP/GTP B motif (also called G-3 inGTP-binding proteins) was selected. 
[0676] Consensus pattern: D-[LIVM]-P-G-[LIVM](2)-[DEY]-[GN]-A-x(2)-G-x-G - 

[ 1] Sazuka T, Tomooka Y., Ikawa Y, Noda M., Kumar S. Biochem. Biophys. Res. Commun. 189:363-370(1992). 
[ 2] Hudson J.D., Young RG. Gene 125:191-193(1993). 
AO [ 3] Bourne H.R., Sanders D.A., McCormick F. Nature 349:117-127(1991). 

[0677] 227. (GTP_EFTU1) 

ATP/GTP-binding site motif A (P-loop) . 

[0678] From sequence comparisons and crystal lographic data analysis it has been shown [1,2,3,4,5,6] that an ap- 

45 preciable proportion of proteins that bind ATP or GTP share a number of more or less conserved sequence motifs. 
The best conserved of these motifs is a glycine-rich region, which typically forms a flexible loop between a beta-strand 
and an alpha-helix. This loop interacts with one of the phosphate groups of the nucleotide. This sequence motif is 
generally referred to as the 'A' consensus sequence [1 ] or the 'P-loop' [5] There are numerous ATP- or GTP-binding 
proteins in which the P-loop is found. Listed below are a number of protein families for which the relevance of the 

50 prosonco of such motif has been noted: - ATP synthase alpha and beta subunits (see <PDOC00137>). - Myosin heavy 
chains. - Kinesin heavy chains and kinesin-like proteins (see < PPOC00343 >). - Dynamins and dynamm-like proteins 
(see <PDOC00362>). - Guanylate kinase (see <PDOC00670 >). - Thymidine kinase (see <PDOC00524>). - Thymi- 
dilate kinase (see <PDOC01034 >). - Shikimate kinase (see <PDOC00868 >). - Nitrogenase iron protein family (nifH/ 
frxC) (see <PDOC00580 >). - ATP-binding proteins involved in 'active transport' (ABC transporters) [7] (see 

55 <PDOC00185>). - DNA and RNA helicases [8,9,10]. - GTP-binding elongation factors (EF-Tu, EF-1alpha, EF-G, EF- 
2, etc.). - Ras family of GTP-binding proteins (Ras, Rho, Rab, Ral, Ypt1 , SEC4, etc.). - Nuclear protein ran (see 
<PDOC00859>). - ADP-ribosylation factors family (see <PDOC00781 >). - Bacterial dnaA protein (see <PPOC00771>). 
- Bacterial recA protein (see <PDOC001 31 >). - Bacterial recF protein (see <PDOC00539 >). - Guanine nucleotide- 
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binding proteins alpha subunits (Gi, Gs, Gt, GO, etc.). - DNA mismatch repair proteins mutS family (See 
< PDOC00388 >). - Bacterial type II secretion system protein E (see <PDOC00567 >). Not all ATP- or GTP-binding pro- 
teins are picked-up by this motif. A number of proteins escape detection because the structure of 'their, ATP-binding 
site is completely different from that of the P-loop. Examples of such proteins are the E1 -E2 ATPases or the glycolytic 
5 kinases. In other ATP- or GTP-binding proteins the flexible loop exists in a slightly different form; this is the case for ■ 
tubulins or protein kinases. A special mention must be reserved for adenylate kinase, in which there is a single deviation 
from the P-loop pattern: in the last position Gly is found instead of Ser or Thr. 

Consensus pattern: [AG]-x(4)-G-K-[ST]- 

10 

I 1] Walker J. E , Saraste M., Runswick M.J., Gay N.J. EMBO J. 1:945-951(1982). 
[ 2] Moller W., Amons R. FEBS Lett. 186:1-7(1985). 

| 3] Fry D.C., Kuby S.A., Mildvan A.S. Proc. Natl. Acad. Sci. U.S.A. 83:907-911(1986). 
[ 4] DeverT.E., Glynias M.J., Merrick W.C. Proc. Natl. Acad. Sci. U.S.A. 84:1814-1818(1987). 
15 l 5] Saraste M., Sibbald PR., Wittinghofer A. Trends Biochem. Sci. 15:430-434(1990). 

[ 6] Koonin E.V J. Mol. Biol. 229:1165-1174(1993). 

[7] Higgins C.F., Hyde S.C., Mimmack M M., Gileadi U., Gill D.R., Gallagher M.P J. Bioenerg. Biomembr. 22: 
571-592(1990). 

[ 8] Hodgman T.C. Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata). 
20 [9] Linder P, LaskoP, Ashburner M., Leroy P., Nielsen P.J., Nishi K., Schnier J., Slonimski P.P. Nature 337:121-122 

(1989). 

[10] Gorbalenya A.E., Koonin E.V., Donchenko A.P., Blinov V.M. Nucleic Acids Res. 17:4713-4730(1989). 

[0679] GTP-binding elongation factors signature (GTP_EFTU2) 

2S Elongation factors [1 ,2] are proteins catalyzing the elongation of peptide chains in protein biosynthesis. In both prokary- 
otes and eukaryotes, there are three distinct types of elongation lactors, as described in the following table: 

Eukaryotes Prokaryotes Function 

EF-1 alpha EF-Tu Binds GTP and an aminoacyl-tRNA; deliv- 
ers the latter to the A site of ribosomes. EF-1 beta EF-Ts Interacts with EF-la/EF-Tu to displace GDP and thus allows 

30 the regeneration of GTP-EF-1a. EF-2 EF-G Binds GTP and peptidyl-tRNA and translocates the latter from the A site 

to the P site. The GTP-binding elongation factor family also 

includes the following proteins: - Eukaryotic peptide chain roloase factor GTP-binding subunits [3]. Thoso protoins 
interact with release factors that bind to ribosomes that have encountered a stop codon at their decoding site and holp 
them to induce release of the nascent polypeptide. The yeast protein was known as SUP2 (and also as SUP35, SUF 1 2 

3S or GST1) and the human homolog as GST1-Hs. - Prokaryotic peptide chain release factor 3 (RF-3) (gene prfC). RF- 
3 is a class-ll RF, a GTP-binding protein that interacts with class I RFs (see <PDOC00607>) and enhance their activity 
[4]. - Prokaryotic GTP-binding protein lepA and its homolog in yeast (gene GUF1) and in Caenorhabditis elegans 
(ZK1236.1). - Yeast HBS1 [5]. - Rat statin S1 [6], a protein of unknown function which is highly similar to EF-1alpha. - 
Prokaryotic selenocysteine-specific elongation factor selB [7], which seems to replace EF-Tu for the insertion of se- 

40 lenocysteine directed by the UGA codon. - The tetracycline resistance proteins tetM/tetO [8,9] from various bacteria 
such as Campylobacter jejuni, Enterococcusfaecalis, Streptococcus mutans and Ureaplasma urealyticum. Tetracycline 
binds to the prokaryotic ribosomal 30S subunit and inhibits binding of aminoacyl-tRNAs. These proteins abolish the 
inhibitory effect of tetracycline on protein synthesis. - Rhizbbium nodulation protein nodQ [10]. - Escherichia coli hy- 
pothetical protein yihK [11]. In EF-1 -alpha, a specific region has been shown [12] to bo involved in a conformational 

45 change mediated by the hydrolysis of GTP to GDP This region is conserved in both EF-lalpha/EF-Tu as well as EF- 
2/EF-G and thus seems typical for GTP-dependent proteins which bind non-initiator tRN As to the ribosome. The pattern 
developed for this family of proteins include that conserved region. 

[0680] Consensus pattern: D-[KRSTGANQFYW]-x(3)-E-[KRAQ]-x-[RKQD]-[GC]-[IVMK]-[ST]- [IV]-x(2)-[GSTACK- 
RNQ]- 

50 

I 1] Concise Encyclopedia Biochemistry, Second Edition, Walter de Gruyler, Berlin Now- York (1988). 
[ 2] Moldave K. Annu. Rev. Biochem. 54:1109-1149(1985). 

[ 3] Stansfield I,, Jones K.M., Kushnirov V.V., Dagkesamanskaya A.R., Poznyakovski A.I., Paushkin S.V., Nierras 
C.R., Cox B.S., Ter-Avanesyan M.D., Tuite M.F. EMBO J. 14:4365-4373(1995). 
55 [ 4] Grentzmann G., Brechemier-Baey D , Heurgue-Hamard V., Buckingham R.H. J. Biol. Chem. 270:10595-10600 

(1995V ■ 1 

[ 5] Nelson R.J., Ziegelhoffer T, Nicolet C, Wernor-Washburne M„ Craig EA Cell 71:97-105(1992) , 
[ 6] Ann D.K., Moutsatsos I.K., Nakamura T, Lin H.H., Mao P.-L, Lee M.-J., Chin S„ Llem R.K.H., Wang E. J. Biol. 


110 


EP 1 033 405 A2 


Chem. 266:10429-10437(1991). 

[ 7] Forchammer K., Leinfeldr W., Bock A. Nature 342:453-456(1989). 
[ 8] Manavathu E.K., Hiratsuka K., Taylor D.E. Gene 62:17-26(1988). 

[ 9] Lcblanc D.J., Log L.N., Titmas B.M., Smith CJ., Tenover F.C. J.'Bacteriol. 170:3618-3626(1988). . 
5 [10) Corvantos E., Sharma S.B., Maillet F., Vasse J., Truchet G„ Rosenberg C. Mol. Microbiol. 3:745-755(1989). 

[11] Plunkett G. Ml, Burland V.D., Daniels D.L., Blattner RR. Nucleic Acids Res. 21:3391-3398(1993). 
[12] Moller W., Schipper A., Amons R. Biochimie 69:983-989(1987). 

[0681] 228. GTP cyclohydrolase II. 
io GTP cyclohydrolase II catalyses the first committed step in the biosynthesis ot riboflavin. 

[0682] [1] Richter G, Ritz H, Katzenmeier G, Volk R, Kohnle A, Lottspeich F, Allendorf D, Bacher A, J.Bacteriol 1993; 
175:4045-4051. 

[0683] 229. Galactose-1 -phosphate uridyl transferase signatures (GalP.UDPjransf) 

Galactose- 1 -phosphate uridyl transferase (EC 2.7.7.10 ) (galT) catalyzes the transfer of an uridyldiphosphate group on 
is galactose (or glucose) 1 -phosphate. During the reaction, the uridyl moiety links to a histidine residue. In the Escherichia 
coli enzyme, it has been shown [1] that two histidine residues separated by a single proline residue are essential lor 
enzyme activity. On the basis of sequence similarities, two apparently unrelated families seem to'exist. Ciass-I enzymes 
aro found in eukaryotes as well as some bacteria such as Escherichia coli or Streptomyces I'rvidans, while class-ll 
onzymos have boon found so tar only in bacteria such as Bacillus subtilis or Lactobacillus helveticus [2]. Signature 
20 patterns for both families were developed. For class-l enzymes the signature is based on the active site residues. For 
class-ll enzymes a region which also includes two conserved histidines was chosen. 
Consensus pattern: F-E-N-[RK]-G-x(3)-G-x(4)-H-P-H-x-Q [The two H's are the active site residues]- 
[0684] Consensus pattern: D-L-P-l-V-G-G-[ST]-[LIVM](2)-[SA]-H-[DEN]-H-[FY]-0-G-G- Note: class-l enzymes are 
structurally related to the HIT family of proteins (see <PDOC00694 

25 

I 1] Reichardt J.K.V., Berg P. Nucleic Acids Res. 16:9017-9026(1988). 
| 2] Mollet B., Pilloud N. J. Bacteriol. 173:4464-4473(1991). 

[0685] 230. Gamma-thionins family signature 
so [0686] The following small plant proteins are evolutionary related: 

Gamma-thionins from wheat endosperm (gamma-purothionins) and barley (gamma- hordothionins) which are toxic 
to animal colls and inhibit protein synthesis in cell free systems [1]. 
A flower-specific thionin (FST) from tobacco [2]. 
35 - Antifungal proteins (AFP) from the seeds of Brassicaceae species such as radish, mustard, turnip and Arabidopsis 

thaliana [3]. 

Inhibitors of insect alpha-amylases from sorghum [4]. 
Probable protease inhibitor P322 from potato. 
A germination-related protein from cowpea [5]. 
40 - Anther-specific protein SF18 from sunflower [6]. SF18 is a protein that contains a gamma-thionin domain at its N- 
terminus and a proline-rich C- terminal domain. 
Soybean sulfur-rich protein SE60 [7]. 
Vicia faba antibacterial peptides fabatin-1 and -2. 

45 [0687] In their mature form, those proteins generally consist of about 45 to 50amino-acid residues. As shown in the 
following schematic representation, these peptides contain eight conserved cysteines involved in disulfide bonds. 

+ + 1 --+ 1 1 1 1 1 

50 X xCxxxxxxxxxxCxxxxxCxxxCxxxxxxxxxCxxxxxxCxCxxxC •••••••••••••••••••|*—| | 

I+-I + I+- + 

!C: conserved cysteine involved in a disulfide bond. 

'*': position of the pattern. 
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[0688] Consensus pattern: [KRG]-x-C-x(3)-[SV]-x(2)-[FYWH]-x-[GF]-x-C-x(5)-C-x(3)-C [The four C*s are involved in 
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disulfide bonds]- 

[1] Bruix M., Jimenez M.A., Santoro J., Gonzalez C, Colilla RJ., Mendez E., Rico M. Biochemistry 32:715-724 
(1993). 

s [2] Gu Q., Kawata E.E., Morse M.-J., Wu H.-M., Cheung A.Y. Mol. Gen. Genet. 234:89-96(1992). 

[3] Terras F R.G., Torrekens S., van Leuven F., Osborn R.W., Vanderleyden J., Cammue B.P.A., Broekaert W.F. 
FEBS Lett. 316:233-240(1993). 

[4] Bloch C. Jr., Richardson M. FEBS Lett. 279:101-104(1991). - 
[5] Ishibashi N., Yamauchi D., Miniamikawa T. Plant Mol. Biol. 15:59-64(1990). 
10 [7] Choi Y, Choi YD., Lee J.S. Plant Physiol. 101:699-700(1993). 

[0689] 231. Gelsolin. Gelsolin repeat. Number of members: 170 

[0690] (1]Medline: 97433077. The crystal structure of plasma gelsolin: implications for actin severing, capping, and 
nucleation. Burtnick LD, Koepf EK, Grimes J, Jones EY, Stuart Dl, McLaughlin PJ, Robinson RC; Cell 1 997,90:661 -670. 

is [0691] 232. Germin family signature 

Germins [1] are a family of homopentameric cereal glycoproteins expressed during germination which may play a role 
in altering the properties of cell walls during germinative growth. It has been shown that wheat and barleygermins act 
as oxalate oxidases (EC 1.2.3.4 ), an enzyme that catalyzes the oxidative degradation of oxalate to carbonate and 
hydrogen peroxide. Germins are highly similar to: - Germin-like proteins from various plants such as rape, violet or 

20 white mustard. - Slime mold spherulins la and 1b which are proteins that accumulate specifically during spherulation, 
a process induced by various forms of environmental stress which leads to encystment and dormancy. As a signature 
pattern the best conserved region was selected: a decapeptide located in the central section of these proteins. 
[0692] Consensus pattern: G-x(4)-H-x-H-P-x-A-x-E-[LIVM]- 
[0693] [1 ] Lane B.G. FASEB J. 8:294-301 (1 994). 

25 [0694] 233. (GlutR) 

Glutamyl-tRNA reductase signature 

[0695] Delta-aminolevulinic acid (ALA) is the obligatory precursor for the synthesis of all tetrapyrroles including por- 
phyrin derivatives such as chlorophyll and heme. ALA can be synthesized via two different pathways: the Shemin (or 
C4) pathway which involves the single step condensation of succinyl-CoA and glycine and which is catalyzed by ALA 

30 synthase (EC 2.3.1 .37) and via the CSpathway from the five-carbon skeleton of glutamate. The C5 pathway operates 
in the chloroplast of plants and algae, in cyanobactoria, in somo oubactoria and in archnobactorin. 
[0696] The initial step in the C5 pathway is carried out by glutamyt-tRNA roductaso (GluTR) [1] which catalyzos tho 
NADP-dependent conversion of glutamate- tRNA(Glu) to g!utamate-1-semialdehyde (GSA) with the concomitant re- 
lease of tRNA(Glu) which can then be recharged with glutamate by glutamyl-tRNA synthetase. 

35 [0697] GluTR is a protein of about 50 Kd (467 to 550 residues) which contains a few conserved region. The best 
conserved region is located in positions 99 to 122 in the sequence of known GluTR. This region seems important for 
the activity of the enzyme. We have devoloped a signature pattorn Irorn that conserved roglon. 

[0698] Consensus patternH-[LIVM]-x(2)-[LIVM]-[GSTAC](3)-[LIVM]-[DEQ]-S-[LIVMA]-[LIVM](2)-[GF]-E-x-[EQR]- 
[I V]-[LIT]-[STAG]-Q-[LIVM]-[KR] Sequences known to belong to this class delected by the pattern ALL. 
40 [0699] [1] Jahn D., Verkamp E., Soell D. Trends Biochem. Sci. 17:215-218(1992). 
[0700] 234. (Glycoprotease) 

Glycoproteaso family signature (aka Pop1idaeo_M22) 

[0701] Glycoproteaso (GCP) (EC 3.4.24.57) 1 1], oro-syaloglycoprotoln ondopoptldneo, le h rnoUiHoprototitio eocroled 
by Pasteurella haemolytica which specifically cleaves O-sialoglycoproteins such as glycophorin A. The sequence of 
45 GCP is highly similar to the following uncharacterized proteins: 

Escherichia coli hypothetical protein ygjD (ORF-X). 

Bacillus subtilis hypothetical protein ydiE. 

Mycobacterium leprae hypothetical protein U229E. 
50 - Mycobacterium tuberculosis hypothetical protein MtCY78.10. 

Synechocystis strain PCC 6803 hypothetical protein slr0807. 

Methanococcus jannaschii hypothetical protein MJ1130. 

Haloarcula marismortui hypothetical protein in HSH 3' region. 
- . Yeast hypothetical protein YKR038c. 
55 - Yeast hypothetical protein QRI7. . 

[0702] One of the conserved regions contains two conserved histidines. It is possible that this region is involved in 
coordinating a metal ion such as zinc. 
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[0721] [1] Medline: 95252686. A family of UDP-GlcNAc/MurNAc: polyisoprenol-P GlcNAc/MurNAc-1-P transferases. 

Lehrman MA; Glycobiology 1994;4:768-771. 

[0722] 241. Glycosyl hydrolases family 15. 21 members. 

[0723] 242. Glycosyl hydrolases family 16 signature 

s It has been shown [1] that the following glycosyl hydrolases can be classified into a single family on the basis of , 
sequence similarities: - Bacterial beta-t,3-1 ( 4-glucanases, or lichenases, (EC 3.2.1.73 ) mainly from Bacillus but also 
from Clostridium thermocellum (gene licB), Fibrobacter succinogenes and Rhodothermus marinus (gene bgIA). - Ba- 
cillus circulans beta-1 ,3-glucanase A1 (EC 3.2.1.39 ) (gene glcA), - Lamarinase (EC 3.2.1.6 ) from Clostridium thermo- 
cellum (gene Iam1). - Streptomyces coelicolor agarase (EC 3.2.1.81 ) (gene dagA). - Alteromonas carrageenovora 

io kappa-carrageenase (EC 3.2.1.83 ) (gene cgkA).Two closely clustered conserved glutamates have been shown [2] to 
be involved in the catalytic activity of Bacillus licheniformis lichenase. The region was used that contains these residues 
as a signature pattern. 

[0724] Consensus pattern: E-[LIV]-D-[LIV]-x(0 ( 1)-E-x(2HGQHKRNF]-x-IPSTA] [The two E's are active site resi- 
dues]- 

75 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Juncosa M., Pons J., Dot T, Quero! E., Planas A. J. Biol. Chem. 269:14530-14535(1994). 

[0725] 243. Glycosyl hydrolases family 1 7 signature 

20 it has been shown [1,2] that the following glycosyl hydrolases can be classified into a single family on the basis of 
sequence similarities: - Glucan endo-t ,3-beta-g!ucosidases (EC 3.2.1.39 ) (endo-(1->3)-beta- glucanase) from various 
plants. This enzyme may be involved in the defense of plants against pathogens through its ability to degrade fungal 
cell wall polysaccharides. - Glucan 1 ,3-beta-glucosidase (EC 3.2.1.58 ) (exo-(1->3)-beta-g!ucanase) from yeast (gene 
BGL2). This enzyme may play a role in cell expansion during growth, in cell-cell fusion during mating, and in spore 

25 release during sporulation. - Lichenases (EC 3.2.1.73 ) (endo-(1->3,1->4)-beta-glucanase) Irom various plants. The 
best conserved region in the sequence of these enzymes is located in their central section. This region contains a 
conserved tryptophan residue which could be involved in the interaction with the glucan substrates [2] and it also 
contains a conserved glutamate which has been shown [3] to act as the nucleophile in the catalytic mechanism, this 
region was used as a signature pattern. 

30 Consensus pattern: [LIVM)-x-ILIVMFYWA)(3)-[STAG]-E-[STA]-G-W-P-[STN]-x-[SAGQ] [E is an active site residue]- 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Ori N., Sessa G., Lotan T., Himmelhoch S., Fluhr R. EMBO J. 9:3429-3436(1990). 

[ 3] Varghese J.N., Garrett T.P.J., Colman P.M., Chen L, Hoj P.J., Fincher G.B. Proc. Natl. Acad. Sci. U.S.A. 91: 
35 2785-2789(1994). 

[0726] 244. Glyoxalase I signatures 

Glyoxalase I (EC 4.4.1.5 ) (lactoylglutathione lyase) catalyzes the first step of the glyoxal pathway, the transformation 
of methylglyoxal and glutathioneinto S-lactoylglutathione which is then converted by glyoxalase II to lactic acid [1]. 

40 Glyoxalase I is an ubiquitous enzyme which binds one mole of zinc per subunit. The bacterial and yeast enzymes are 
monomeric while the mammalian one is homodimeric. The sequence of glyoxalase I is well conserved. In bacteria and 
mammals, the enzyme is a protein of about 1 30 to 1 80 residues while in fungi it is about twice longer. In these organisms 
the enzyme is built out of the tandem repeat of an homologous domain. Two signature patterns for this family wore 
derived. The first one is located in the N-terminal region white the second one is located in the central section of the 

45 protein and contains a conserved histidine that could be implicated in the binding of the zinc atom, 

[0727] Consensus pattern: [HQ]-[I VT]-x-[LIVFY]-x-[IV]-x(5)-[STA]-x(2)-F-[YM]-x(2 I 3)-[LMF]-G-[LMF]- 
Consensus pattern: G-[NTKQ]-x(0,5)-[GA]-[LVFY]-[GH]-H-[IVF]-[CGA]-x-[STAGLE]-x(2)-[DNC]- 
[0728] [ 1] Kim N.-S., Umezawa Y, Ohmura S., Kato S. J. Biol. Chem. 268:11217-11221(1993). 
[0729] 245. (Glypican) 

50 Glypicans signature 

Glypicans [1,2] are a family of heparan sulfate proteoglycans which are anchored to cell membranes by a glycosyl- 
phosphatidylinositol (GPI) linkage. Structurally, these proteins consist of three separate domains: 

a) A signal sequence; 

ss b) An GXtmcollulfir domain of nbout 500 loslduos truit conlnlno 1 2 conuorvod cyotoinoo probubly InvolvocJ In cMtmlflcio 

bonds and which also contains the sites of attachment of the heparan sulfate glycosaminoglycan side chains'; 
c) A C-terminal hydrophobic region which is post-translationally removed after formation of the GPI-anchor. 
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[0730] The proteins known to belong to this family are: 

- Glypican 1 (GPC1). 
Glypican 2 (GPC2) or corobroglycan. _ 

- Glypican 3 (GPC3) or OCI-5. In man, dolocls In GPC3 arotho cause of a X-linkod genetic disease, Simpson- 
. Galabi-Behmel syndrome (SGBS). 

K-glypican. 

- Glypican 5 (GPC5). 
Drosophila protein dally. 

[0731] The signature pattern that was developed lor glypicans Is located in the central section of the extracellular 
domain and contains live of the conserved cysteines. 

[0732] Consensus pallornC-x(2)-C-x-G-[LI VM]-x(4)-P.C-x(2)-[FY]-C-x(2)-[LlVM]-x(2)- G-C [The C's are probably .n- 
volved in a disulfide bonds] Sequences known to belong to this class detected by the pattern ALL, except for dally. 

| 1] Woknborg R., Squire J. A., Tomploton D.M. Nat. Genet. 12:225-227(1996). 
| 2J Watanabo K„ Yamada H., Yamaguchl Y. J. Coll Biol. 130:1207-1218(1995). 

[0733] 246. Granins signatures 

20 Granins (chromogranins or secretogranins) [1] are a family of acidic proteins present in the secretory granules of a 
wide variety of endocrine and neuro-endocrine cells. The exact f unction(s) of these proteins is not yet known but they 
nnom to bo the precursors of biologically active peptides and/or they may act as helper proteins in the packaging of 
peptide hormones and neuropeptides. Throe mombers of this family ol proteins show some sequence similarities: - 
Chromogranin A (CGA) [2]. CGA is a protein of about 420 residues; it is the precursor of the peptide pancreastatm 

26 which strongly inhibits glucoso-inducod insulin release from the pancreas. - Secretogranin 1 (chromogranin B). A sul- 
fated protein of about 600 residues. - Secretogranin 2 (chromogranin C). A sulfated protein of about 650 residues. 
Apart from their subcellular location and the abundance of acidic residues(Asp and Glu), these proteins do not share 
many structural similarities. Only one short region, located in the C-terminal section, is conserved in all these proteins. 
Chromogranins A and B share a region of high similarity in their N-terminal section; this region includes two cysteine 

30 residues involved in a disulfide bond 

[0734] Consensus pattern: [DE]-[SN]-L-[SAN]-x(2)-[DE]-x-E-L- c „ fTU# ^ r l.h 

Consensus pattern: C-[LI VM](2)-E-[LIVM](2).S-[DN]-[STA]-L-x-K-x-S-x(3)- [LIVM]-[STA]-x-E-C [The two C s are hnked 

by a disulfide bond]- 

35 [ 1] Huttner W.B., Gerdes H.-H., Rosa P. Trends Biochem. Sci. 16:27-30(1991). 

[ 2] Simon J. -P., Aunis D. Biochem. J. 262:1-13(1989). 

[0735] 247. grpE protein signature 

In prokaryotos the grpE protein [1] stimulates, jointly with dnaJ, the ATPase activity of the dnaK chaperone. It seems 
40 to accelerate the release of ADP from dnaK thus allowing dnaK to recycle more efficiently. GrpE is a protein of about 
22 to 25 Kd In yeast, an evolutionary related mitochondrial prolein(gene GRPE) has been shown [2] to associate with 
the mitochondrial hsp70protein and to thus play a role in the import of proteins from the cytoplasm. As a signature 
pattern, the most conserved region of grpE was selected. It is located in the C-terminal section. 
[0736] Consensus pattern: [FLMDN]-[PHEA]-x(2)-[HM]-x-A-[LlVMTN]-x(16,20)-G-[FY]- x(3)-[DEG]-x(2HLIVM]- 
45 [RI]-x-[SA]-x-V-x-[IV]- 

[ 1] Georgopoulos C, Welch W. Annu. Rev. Cell Biol. 9:601-635(1993). 

[ 2) Bolliger L, Deloche O., Glick B.S., Georgopoulos C, Jenoe P., Kronidou N., Horst M., Morishima N. t Schatz ( 
G. EMBO J. 13:1998-2006(1994). 

so 

[0737] 248. Guanylato kinase signature and profile 

Guanylate kinase (EC 2.7.4.B ) (GK) [1] catalyzes the ATP-dopondent phosphorylation of GMP into GDP It is essential 
for recycling GMP and indirectly, cGMP. In prokaryotes (such as Escherichia coli). lower eukaryotes (such as yeast) 
and in vertebrates GK is a highly conserved monomeric protein of about 200 amino acids. GK has been shown [2,3,4] 
ss to be structurally similar to the following proteins: - Protein A57R (or SaIG2R) from various strains of Vacc.n.a virus. 
This protein is highly similar to GK, but contains a trameshifl mutation in the N-terminal section and could therefor be 
inactive in that virus. The following proteins are characterized by the presence in their sequence of one or more copies 
of the DHR domain, a SH3 domain (soo <PDOC50002 > as well as a C-terminal GK-like domain, these protein ar 
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[0738] Consensus pattern: T-[ST]-R-x(2) IKH) m n 

^r^«-S— 4,(, " 2> 

[ 2] Carey A.T., Holt K., ncdiu 

[ 4] Henrissat B., Callebaut I., Fabrega S., Lenn 
7090-7094(1995). 

ol sequence similant.es: ^ ^.^ ^ aiso , rom c|osUidium 
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[ 2] Juncosa M., Pons J., Dot T., uueroi c 
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[0748] 251. (Glyco_hydro_17) 
Glycosyl hydrolases family 17 signature 

(aka glyc0syl_hydro4) . . , »u 

[0749] It hno boon ohown [1,2] that tho following glycosyl hydrolosos can bo classiliod into a single tamily on the 

5 basis of sequence similarities: 

- Glucan endo-1,3-beta-glucosidases (EC 3.2.1.39) (endo-(1->3)-beta-glucanase) from various plants. This enzyme 
may be involved in the defense of plants against pathogons through its ability to degrade fungal cell wall polysac- 
charides. rtX 

io - Glucan 1 ,3-bota-glucosidase (EC 3.2.1.58) (exo-(1->3)-beta-glucanase) Irom yeast (gene BGL2). This enzyme 
may play a role in cell expansion during growth, in cell-cell fusion during mating, and in spore release during 
sporulation. 

- Lichonasos (EC 3.2.1.73) (ondo-(1->3,1->4)-beta-glucanase) from various plants. 

is [0750] The best conserved region in the sequence of these enzymes is located in their central section. This region 
contains a consorvod tryptophan residue which could be involved in the interaction with the glucan substrates [2] and 
it also contains a conserved glutamate which has been shown [3] to act as the nucleophile in the catalytic mechanism. 
This region was used as a signature pattern. 
[0761] Consensus pattern [LIVM]-x-|LIVMFYWA^^^ 

20 idue] Sequences known to belong to this class detected by the pattern ALL. 

| 1] Honrissat B, Biochom. J. 280:309-316(1991). 

I 2] Ori N Sessa G., Lotan T., Himmelhoch S., Fluhr R. EMBO J. 9:3429-3436(1990). 

[ 3] Varghese J.N., Garrett T.P.J., Colman P.M., Chen L, Hoj P.J., Fincher G.B. Proc. Natl. Acad. ScL U.S.A. 91: 
25 2785-2789(1994). 

[0752] 252. (Glyco_hydro_3) 
Glycosyl hydrolases family 3 active site 

[0753] It has been shown [1 ,2] that the following glycosyl hydrolases can bo, on the basis of sequence similarities, 
30 classified into a single lamily: 

- Beta glucosidases (EC 3.2.1 .21) from the fungi Aspergillus wentii (A-3), Hansenula anomala, Kluyveromyces fra- 
gilis Saccharomycopsis fibuligera, (BGL1 and BGL2), Schizophyllum commune and Trichoderma reesei (BGL1). 

- Beta glucosidasos Irom tho bacteria Agrobacterium tumofacions (Cbg1 ), Butyrivibrio tibrisolvens (bgIA), Clostnd- 
35 ium thermocellum (bgIB), Escherichia coli (bglX), Erwinia chrysanthemi (bgxA) and Ruminococcus albus. 

Alteromonas strain 0-7 beta-hexosaminidase A (EC 3.2. 1 .52). 
Bacillus subtilis hypothetical protein yzbA. 

- Escherichica coli hypothetical protein ycfO and HI0959, the corresponding Haemophilus influenzae protein. 

ao [0754] One of tho consorvod regions In those enzymes Is contorod on a conserved aspartic acid residue which has 
been shown [3], in Aspergillus wentii beta- glucosidase A3, to be implicated in the catalytic mechanism. This regton 
was used as a signature pattern. 

[0755] Consensus pattern[LIVM](2)-[KR]-x-[EQK]-x(4)-G-[LIVMFT]-[LI VT]-[LIVMF]- [ST]-D-x(2)-[SGADNI] [D is the 
active site residue) Sequences known to belong to this class detected by the patternALL. 

AS 

I 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Castle L.A., Smith K.D., Morris R.O. J. Bacteriol. 174:1478-1486(1992). 
[ 3] Bause E., Legler G. Biochim. Biophys. Acta 626:459-465(1980). 

so [0756] 253. (Glyco_hydro_28) 

Polygalacturonase activo site (aka PG) 

[0757] Polygalacturonase (EC 3.2.1.15) (PG) (pectinase) [1,2] catalyzes the random hydrolysis of 1 ,4-alpha-D-ga- 
lactosiduronic linkages in pectate and other galacturonans. In fruit, polygalacturonase plays an important role in cell 
wall metabolism during ripening. In plant bacterial pathogens such as Erwinia carotovora or Pseudomonas 
55 solanacearum and fungal pathogens such as Aspergillus niger, polygalacturonase is involved in maceration and soft- 
rotting of plant tissue. 

[0758] Exo-poly-alpha-D-gatacturonosidaso (EC 3.2. 1 .82) (exoPG) [3] hydrolyzes peptic acid from the non-reducing 
end, releasing digalacturonate. 
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acting as a nucleophile. This region was used as a signature pattern. As a second signature pattern we select d a 
conserved region, found in the N-terminal extremity of these enzymes; this region also contains a glutamic acid residue. 
[0769] bonsensus pattern[LIVMFSTC]-[LIVFYS]-[LIV]-[LIVMST]-E-N.G-[LIVMFAR]-[CSAGN] [E is the act.ve site 
rosiducl Soquoncos known to belong to this class detected by the patternALL 

10770] Note- this pattern will pick up the last two domains of LPH; the first two domains„which are removed from the 
LPH precursor by proteolytic processing, have lost the active site glutamate and may therefore be inactive [4]. 
[0771] Consensus patternF.x-[FYWM]^[GSTA]-x-[GSTA] ; x.[GSTA](2)-[FYNH]-[NQ]-x-E-x-[GSTA] Sequences 
known to belong to this class detected by the pattern ALL. 
[0772] Note: this pattern will pick up the last three domains of LPH. 

[ 1] Henrissal B. Biochom. J. 280:309-316(1991). 

[ 2] Henrissat B. Protein Seq. Data Anal. 4:61-62(1991). 

[ 3] Gonzalez-Candelas L, Ramon D. t Polaina J. Gene 95:31-38(1990). 

[ 4] El Hassouni M., Henrissat B., Chippaux M., Barras F. J. Bacteriol. 174:765-777(1992). 

[ 5] Withers S.G., Warren R.A.J.. Street I. P., Rupitz K. t Kempton J.B.. Aebersold R. J. Am. Chem. Soc. 112: 
5007-5009(1990). 


[0773] 256. Glyco_hydro_20 
Glycosyl hydrolase family 20 
20 Previous Ptam IDs: glycosyl_hydr11; 
Number of members: 33 
[0774] 257. (GlycoJ'iydro_9) 
Glycosyl hydrolases family 9 active sites signatures 

(aka Glycosyi_hydrl2) , , 

[07751 The microbial degradation of cellulose and xy lans requires several types of enzymes such as endoglucanases 
EC 3 2 1 4) cellobiohydrotases (EC 3.2.1.91) (exoglucanases), or xylanases (EC 3.2.1.8) [1,2]. Fungi and bacteria 
produces a spectrum of cellulolylic enzymes (cellulases) and xylanases which, on the basis of sequence similarities, 
can bo classified into families. One of those families is known as the collulaso family E [3] or as the glycosyl hydrolases 
family 9 [4.E1]. The enzymes which are currently known to belong to this family are listed below 


Butyrivibrio fibrisolvens cellodextrinase 1 (cedl). 
Cellulomonas fimi endoglucanases B (cenB) and C (cenC). 
Clostridium collulolyticum endoglucanase G (celCCG). 
Clostridium cellutovorans endoglucanase C (engC). 
35 - Clostridium stercoararium endoglucanase Z (avicelase I) (celZ). 

- Clostridium thermocellum endoglucanases D (celD), F (celF) and I (cell ). 
Fibrobacter succinogenes endoglucanase A (endA). 
Pseudomonas fluorescens endoglucanase A (celA). 

Streptomyces reticuii endoglucanase 1 (cell). 
40 - Thermomonospora fusca endoglucanase E-4 (celD). 

- Dictyostelium discoideum spore germination specific endoglucanase 270-6. This slime mold enzyme may digest 
the spore cell wall during germination, to release the enclosed amoeba. 

- Endoglucanases from plants such as Avocado or French bean. In plants this enzyme may be involved the fruit 
45 ripening process. 

T0776] Two of the most conserved regions in these enzymes are centered on conserved residues which have been 
shown [5 6] in the endoglucanase D from Cellulomonas thermocellum, to be important for the catalyt.c activity. The 
first region contains an active site histidine and the second region contains two catalytically important residues: an 

50 aspartate and a glutamate. Both regions were used as signature patterns. 

[0777] Consensus pattern [STV]-x-[LIVMFY]-[STV]-x(2)-G-x-|NKR]-x(4)-[PLIVM]-H-x-R [H is an active site residue] 
Sequences known to belong to this class detected by the pattern ALL, except for Cellulomonas f.m. cenC and Strep- 
tomyces reticuii cell. , ~ 
[0778] Consensus pattern |FYW]-x-D-x(4)-[FYW]-x(3)-E-x-[STA]-x(3)-N-[STA] [D and E are act.ve site residues] Se- 

55 quences known to belong to this class detected by the pattern ALL. except tor Fibrobacter succinogenes endA whose 
sequence seems to be incorrect. 

[ 1] Beguin P. Annu. Rev. Microbiol. 44:219-248(1990). 
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[ 2] Gilkes N.R., Henrissat B., Kilburn D.G. t Miller R.C. Jr., Warren R.A.J. Microbiol. Rev. 55:303-315(1991). 
[ 3] Henrissat B., Claeyssens M., Tomme P., Lemesle L, Mornon J.-P Gene 81:83-95(1989). 
[ 4] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 5] Tomme P, Chauvaux S., Beguin R, Millet J., Aubert J.-P, Claoyssons M. J. Biol. Chom. 266:10313-10318 
5 (1991). 

[ 6] Tomme P., van Beeumen J., Claeyssens M. Biochem. J. 285:319-324(1992). 
[0779] 258. Matrix protein (MA), p15 (GAG_ma) 

[0780] The matrix protein, p15, is encoded by the gag gene. MA is involved in pathogenicity [1]. 
io [0781] [1] : Pozsgay JM, Beilharz MW, Wines BD, Hess AD, Pitha PM, J Virol 1993;67:5989-5999. 
[0782] 259. Gag polyprotein, inner coat protein pi 2 (GAG_P12) 

[0783] The retroviral p12 is a virion structural protein. p12 is proline rich. The function carried out by p12 in assembly 

and replication is unknown. p12C is associated with pathogenicity of the virus 

[1] Pozsgay JM, Beilharz MW, Wines BD, Hess AD, Pitha PM, J Virol 1993;67:5989-5999. 

75 [0784] 260. Glutamine synthetase signatures (GLN-SYNT) 

Glutamine synthetase (EC 6.3.1.2 ) (GS) [1] plays an essential rote in the metabolism of nitrogen by catalyzing the 
condensation of glutamate and ammonia to form glutamine. There seem to be three different classes of GS [2,3,4]: - 
Class I enzymes (GSI) are specific to prokaryotes, and are oligomers of 12 identical subunits. The activity of GSI-type 
enzyme is controlled by the adenylation of a tyrosine residue. The adenylated enzyme is inactive. - Class II enzymes 

20 (GSII) are found in eukaryotes and in bacteria belonging to the Rhizobiaceae, Frankiaceae, and Streptomycetaceae 
families (these bacteria have also a class-l GS). GSII are octamer of identical subunits. Plants have two or more 
isozymes of GSII, one of the isozymes is translocated into the chloroplast. - Class III enzymes (GSIII) has, currently, 
only been found in Bacteroides fragilis and in butyrivibrio fibrisolvens. It is a hexamer of identical chains. It is much 
larger (about 700 amino acids) than the GSI (450 to 470 amino acids) or GSII (350 to 420 amino acids) enzymes. 

25 While the three classes of GS*s are clearly structurally related, the sequence similarities are not so extensive. As 
signature patterns three conserved regions were selected. The first pattern is based on a conserved tetrapeptide in 
the N-terminal section of the enzyme, the second one is based on a glycine-rich region which is thought to bo involved 
in ATP-binding. The third pattern is specific to class 1 glutamine synthetases and includes the tyrosine residue which 
is reversibly adenylated. 

30 [0785] Consensus pattern: IFYWL]-D-G-S-S-x(6,8)-(DENQSTAK]-[SA]-[DE]-x(2)-[LIVMFY]- 
Consensus pattern: K-P-[LIVMFYA]-x(3,5)-|NPAT]-G-[GSTAN]-G-x-H-x(3)-S- 
Consensus pattern: K-[LIVM]-x(5)-[LIVMA]-D-[RKHDN]-[LI]-Y [Y is the site of adenylation]- 

[ 1] Eisenberg D., Almassy R.J., Janson C.A., Chapman M.S., Suh S.W., Cascio D., Smith W.W. Cold Spring 
35 Harbor Symp. Quant. Biol. 52:483-490(1987). 

[ 2] Kumada Y, Benson D.R., Hillemann D., Hosted T.J., Rochefort D.A., Thompson C.J., Wohlleben W., Tateno 

Y. Proc. Natl. Acad, Sci. U.S.A. 90:3009-3013(1993). 

[ 3) Shatters R.G., Kahn M.L. J. Mol. Evol. 29:422-428(1989). 

[ 4] Brown J.R., Masuchi Y, Robb F.T., Doolittle W.F. J. Mol. Evol. 38:566-576(1994). 

40 

[0786] 261. Globins profile (globinl) 

Globins are heme-containing proteins involved in binding and/or transporting oxygen [1]. They belong to a very large 
and well studied family which is widely distributed in many organisms. The major groups of globins are: - Hemoglobins 
(Hb) from vertebrates. Hb is the protein rosponsiblo tor transporting oxygon from tho lungs to othor tissues. It is a 

45 tetramer of two alpha and two beta chains. Most vortobrato species also oxpross spocilic embryonic or fetal form3 of 
hemoglobin where the alpha or the beta chains are replaced by a chain with higher oxygen affinity, as for the gamma, 
delta, epsilon and zeta chains in mammals, for example. - Myoglobins (Mg) from vertebrates. Mg is a monomeric 
protein responsible for oxygen storage in muscies. - Invertebrate globins [2]. A wide variety of globins are found in 
invertebrates. Molluscs generally have one or two muscle globins which are either monomeric or dimeric. Insects, such 

50 as the midge Chironomus thummi, have a large set of extracellular globins. Nematodes and annolids havo a variety 
of intracellular and extracellular globins; some of them are multi- domain polypeptides (from two up to nine<Jomain 
globins) and some produce large, disulfide-bonded aggregates. - Leghemoglobins (Lg) from the root nodules of legu- 
minous plants. Lg provides oxygen for bacteroids. - Flavohemoproteins from bacteria (Escherichia coll hmpA) and 
fungi [3). These proteins consist of two distinct domains: an N-terminal globin domain and a C-terminal FAD-contatning 

ss reductase domain. In bacteria such as Vitreoscilla, the enzyme-associated globin is a single domain protein. All these 
globins seem to have evolved from a common ancestor. Tho profile developed to delect membora ol the globin family 
is based on a structural alignment of selected globin sequences 

[ 1] Concise Encyclopedia Biochemistry, Second Edition, Walter de Gruyter, Berlin New- York (1988).[ 2] Goodman M., 
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Pedwaydon J., Czelusniak J., Suzuki T, Gotoh T., Moens L, Shishikura F. t Walz D.. Vinogradov S. J. Mol. Evol. 27: 
236-249(1988). 

[0787] Plant hemoglobins signature (globin2) 

Loghomoglobins [ 1 ] aro homoprotoins present in tho root nodules of loguminousplants. Leghomoglobins are structurally 
rind functionally rotated to homogtobln and myoglobin. By providing oxygon to tho bacteroids, they ar essential for 
symbiotic nitrogen fixation. Structurally related hemoglobins from the nodules of non-leguminous plants J2.3], and from 
the roots of non-nodulating plants[4] have been recently sequenced. A signature pattern was developed that picks up 
the sequence of plants hemoglobins, exclusively. 
[0788] Consensus pattern: |SN]-P-x-L-x(2)-H-A-x(3)-F- 

[ 1] Powoll R., Gannon F. BioEssays 9:117-121(1988). 

[ 2] Kortt A.A., Trinick M.J., Appleby C.A. Eur. J. Biochem. 175:141-149(1988). 
[ 3) Kortt A.A., Inglis A.S., Fleming A.I., Appleby C.A. FEBS Lett. 231:341-346(1988). 

[ 4] Bogusz D., Appleby C.A. t Landsmann J., Dennis E.S., Trinick M.J., Peacock W.J. Nature 331 : 178- 180(1 988). 
[0789] 262. Fructoso-bisphosphato aldolase class-l active site (glycolytic__enz) 

[0790] Fructoso-bisphosphato aldolase (1 ,2] is a glycolytic onzymo that catalyzes the reversible aldol cleavage or 
condensation of fructose-1 ,6-bisphosphate into dihydroxyacetone-phosphate and glyceraldehyde 3-phosphate.There 
are two classes of fructose-bisphosphate aldolases with different catalytic mechanisms. Class-l aldolases [3], mainly 
20 found in higher eukaryotes, are homotetrameric enzymes which form a Schiff-base intermediate between the C-2 
carbonyl group of the substrate (dihydroxyacetone phosphate)and the epsilon-amino group of a lysine residue. In 
vorlobralos, throo forms of this onzymo aro found: aldolase A in musclo. aldolase B in liver and aldolase C in brain. 
The sequence around the lysine involved in the Schiff-base is highly conserved and can be used as a signature lor 
this class of enzyme. 

25 [0791] Consensus pattern: [LIVM]-x-[LIVMFYW]-E-G-x-[LS]-L-K-P-[SN] [K is involved in Schiff-base formation]- 

[ 11 Perham R.N. Biochem. Soc. Trans. 18:185-187(1990). 

[ 2) Marsh J.J., Lobhorz H.G. Tronds Biochem. Sci. 17:110-113(1992). 

[ 3) Freemont P.S., Dunbar B., Fothergill-Gilmore L A. Biochem. J. 249:779-788(1988). 

30 

[0792] 263. Glycosyl hydrolases family 11 active sites signatures 

The microbial degradation of cellulose and xylans requires several types of enzymes such as endoglucanases (EC 
3.2,1,4 ), cellobiohydrolases (EC 3.2.1.91 ) (exoglucanases), or xylanases (EC 3.2.1.8) [1,2]. Fungi and bacteria pro- 
duces a spectrum of cellulolylic enzymes (cellulases) and xylanases which, on the basis of sequence similarities, can 

35 be classified into families. One of these families is known as the cellulase family G [3] or as the glycosyl hydrolases 
family 1 1 [4, El ]. The enzymes which are currently known to belong to this family are listed below. - Aspergillus awamori 
xylanase C (xynC). - Bacillus circulans, pumilus, stearothermophilus and subtilis xylanase (xynA). - Clostridium ace- 
tobutylicum xylanase (xynB). - Clostridium stercorarium xylanase A (xynA). - Fibrobacter succinogenes xylanase C 
(xynC) which consist of two catalytic domains that both belong to family 10. - Neocallimastix patriciarum xylanase A 

40 (xynA). - Ruminococcus flavefaciens bifunclional xylanase XYLA (xynA). This protein consists of three domains: a N- 
terminal xylanase catalytic domain that belongs to family 11 of glycosyl hydrolases; a central domain composed of 
short repeats of Gin, Asn an Trp, and a C-terminal xylanase catalytic domain that belongs to family 10 of glycosyl 
hydrolases. - Schizophyllum commune xylanase A. - Streptomyces lividans xylanases B (xlnB) and C (xlnC). - Tri- 
choderma reesei xylanases I and II. Two ol the conserved regions in these enzymes are centered on glutamic acid- 

45 residues which have both been shown [5], in Bacillus pumilis xylanase, to be necessary for catalytic activity. Both 
regions were used as signature patterns. 

[0793] Consensus pattern: [PSA]-[LQ]-x-E-Y-Y-[LIVM](2)-[DE]-x-[FYWHN] [E is an active site residue]- 
Consensus pattern: [LIVMF]-x(2)-E-[AG]-[YWG]-[QRFGS]-[SG]-[STAN)-G-x-[SAF] [E is an active site residue]- 

so [1] Beguin P. Annu. Rev. Microbiol. 44:219-248(1990). 

[ 2] Gilkes N.R., Henrissat B. ( Kilburn D.G., Miller R.C. Jr., Warren R.A.J. Microbiol. Rev. 55:303-315(1991). 
[ 3] Henrissat B., Claeyssens M., Tomme P., Lemesle L., Mornon J.-P. Gene 81:83-95(1989). 
- [ 4) Henrissat B. Biochem. J. 280:309-316(1991). 
■ [ 5] Ko EP., Akatsuka H., Moriyama H., Shinmyo A., Hata Y, Katsube Y, Urabe I., Okada H. Biochem. J. 288: 
55 117-121(1992). 

[0794] 264. Glycosyl hydrolase family 14 
[0795] This family are beta amylases. 
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[0796] 265. Glycosyl hydrolases family 1 signatures 

It has been shown [1 to 4] that the following glycosyl hydrolases can be, on the basis of sequence similarities, classified 
into a single family: - Beta-glucosidases (EC 3.2.1 .21 ) from various bacteria such as.Agrobacterium strain ATCC 21400,. 
Bacillus polymyxa, and Caldocellum saccharolyticum. - Two. plants (clover) beta-glucosidases (EC 3.2.1.21 ). - Two 

s different beta-galactosidases (EC 3.2.1.23 ) from the archaebacteria Sulfolobus solfataricus : (genes bgaS and lacS). - . 
6-phospho-beta-galactosidases (EC 3.2.1.85 ) from various bacteria such as lactobacillus casei, Lactococcus lactis, 
and Staphylococcus aureus. - 6-phospho-beta-glucosidases (EC 3.2.1.86) from Escherichia coli (genes bgIB and ascB) 
and from Erwinia chrysanthemi (gene arbB). - Plants myrosinases (EC 3.2.3.1 ) (sinigrinase) (thioglucosidase). - Mam- 
malian lactase-phlorizin hydrolase (LPH) (EC 3.2.1.108 / EC 3.2.1.62 ). LPH/ an integral membrane glycoprotein, is 

io the enzyme that splits lactose in the small intestine. LPH is a targe protein of about 1900 residues which contains four 
tandem repeats of a domain of about 450 residues which is evolutionary related to the above glycosyl hydrolases. One 
of the conserved regions in these enzymes is centered on a conserved glutamic acid residue which, has been shown 
[5], in the beta-glucosidase from Agrobacterium, to be directly involved in glycosidic bond cleavage by acting as a 
nucleophile. This region was used as a signature pattern. As a second signature pattern a conserved region was 

is selected, found in the N-terminal extremity of these enzymes, this region also contains a glutamic acid residue. 

[0797] Consensus pattern: [LIVMFSTC]-[LIVFYS]-[LIV]-[LIVMST]-E-N-G-[LIVMFAR]-[CSAGN] (E is the active site 
residue] 

Note: this pattern will pick up the last two domains of LPH; the first two domains, which are removed from the LPH 
precursor by proteolytic processing, have lost the active site glutamate and may therefore be inactive [4]. 
20 [0798] Consensus pattern: F-x-[FYWM]-[GSTA]-x-[GSTA]-x-[GSTA](2)-[FYNH]-[NQ]-x-E-x-[GSTA]- 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 
[ 2] Henrissat B. Protein Seq. Data Anal. 4:61-62(1991). 
[ 3] Gonzalez-Candelas L, Ramon D., Polaina J. Gene 95:31-38(1990). 
2S [ 4] El Hassouni M., Henrissat B., Chippaux M., Barras F. J. Bacteriol. 174:765-777(1992). 

[5] Withers S.G., Warren R.A.J. , Street I. P., Rupitz K., Kempton J.B., Aebersold R. J. Am. Chem. Soc. 112: 
5887-5889(1990). 

[0799] 266. Glycosyl hydrolases family 2 signatures 

30 it has been shown 11 .2.E11 that the following glycosyl hydrolases can be, on the basis of sequence similarities, classified 
into a single family: - Beta-galactosidases (EC 3.2.1.23 ) Irom bacteria such as Escherichia coli (genes lacZ and ebgA), 
Clostridium acetobutylicum, Clostridium thermosulfurogenes, Klebsiella pneumoniae, Lactobacillus dolbrueckii, or 
Streptococcus thermophilus and from the lungi Kluyveromyces lactis. - Beta-glucuronidase (EC 3.2.1.31 ) from Es- 
cherichia coli (gene uidA) and from mammals. One of the conserved regions in these enzymes is centered on a con- 

35 served glutamic acid residue which has been shown [3], in Escherichia coli lacZ, to be the general acid/base catalyst 
in the active site of the enzyme. This region was used as a signature pattern. As a second signature pattern a highly 
conserved region was selected located some sixty residues upstream from the active site glutamate. 
[0800] Consensus pattern: N-x-[LIVMFYWD]-R-[STACN](2)-H-Y-P-x(4)-[LIVMFYWS](2)-x(3)- [DN]-x(2)-G-[LIVM- 
FYW](4)- 

40 Consensus pattern: [DENQLF]-[KRVW]-N-[HRY]-[STAPV]-[SAC]-[LIVMFS](3)-W-[GS]- x(2,3)-N-E [E is the active site 
residue]- 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Schroeder C.J., Robert C, Lenzen G., McKay L.L., Mercenier A. J. Gen. Microbiol. 137:369-380(1991). 
45 [ 3] Gebler J.C., Aebersold R., Withers S.G. J. Biol. Chem. 267:11126-11130(1992). 

[0801] 267. Glycosyl hydrolases family 3 active site 

It has been shown [1,2] that the following glycosyl hydrolases can be, on the basis of sequence similarities, classified 
into a single family: 

so 

Beta glucosidases (EC 3.2.1 .21) from the fungi Aspergillus wentii (A-3), Hansenula anomala, Kluyveromyces fra- 
gilis, Saccharomycopsis fibuligera, (BGL1 and BGL2), Schizophyllum commune and Trichoderma reesei (BGL1). 
Beta glucosidases from the bacteria Agrobacterium tumefaciens (Cbg1 ), Butyrivibrio fibrisolvens (bgIA), Clostrid- 
ium thermocellum (bgIB), Escherichia coli (bglX), Erwinia chrysanthemi (bgxA) and Ruminococcus albus. - Al- 
ss teromonas strain 0-7 beta-hexosaminidase A (EC 3.2.1.52). 

Bacillus subtilis hypothetical protein yzbA. ' 
Escherichica coli hypothetical protein ycfO and H10959, the corresponding Haemophilus influenzae protein. 
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One ot the conserved regions in these enzymes is centered on a conserved aspartic acid residu which has been 
shown [3], in Aspergillus wontii bota-glucosidaso A3, to be implicated in the catalytic mechanism. This region was used 

as a signature pattern. : . 

[0802] Concensus pattern: [LIVM](2)-[KR]-x-[EQK]-x(4)-G-[LIVMFT]-[LI\n"]-[LIVMF]-[ST)-D-x(2)-[SGADNI] [D is the 

aclivo site rosiduo] 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Castle LA., Smith K.D., Morris R.O. J. Bacteriol. 174:1478-1486(1992). 
[ 3] Bause E. T Legler G. Biochim. Biophys. Acta 626:459-465(1980). 


[0803] 268. Glycosyt hydrolases family 8 signature 

The microbial degradation of cellulose and xylans requires several types ot enzymes such as endbglucanases (EC 
3,2,1.4 ), collobiohydrolases (EC 3.2.1 .91 Mexoglucanases), or xylanases (EC 3.2.1.8) [1,2]. Fungi and bacteria pro- 
duces a spectrum of cellulolytic enzymes (cellulases) and xylanases which, on the basis of sequence similarities, can 

is be classified into families. One of these families is known as the cellulase family D [3] or as the glycosyl .hydrolases 
family 0 |4,E1 ]. Tho onzymoo which mo currently known to belong to this family are listed below. - Acetobacter xylinum 
ondonucloaso cmcAX. - Bacillus strain KSM-330 acidic ondonucloase K (Endo-K). - Cellulomonas josui endoglucanase 
2 (celB). - Cellulomonas uda endoglucanase. - Clostridium cellulolyticum endoglucanases C (celcCC). - Clostridium 
thormocollum endoglucanases A (celA). - Erwinia chrysanthemi minor endoglucanase y (celY). - Bacillus circulans 

20 beta-glucanase (EC 3.2.1.73 ). - Escherichia coli hypothetical protein yhjM. The most conserved region in these en- 
zymes is a stretch of about 20 residues that contains two conserved aspartate. The first asparatate is thought [5] to 
hc\ as tho nucleophilo in the catalytic mechanism. This region was used as a signature pattern. 
Consensus pattern: A-[ST]-D-[AG]-D-x(2)-[IM]-A-x-[SA]-[LIVM]-[LIVMG)-x-A- x(3)-[FW] [The first D is an active site 

residue]- 


[ 1] Beguin P. Annu. Rev. Microbiol. 44:219-248(1990). 

| 2] Gilkes N.R.. Henrissat B., Kilburn D.G.. Miller R.C. Jr., Warren R.A.J. Microbiol. Rev. 55:303-315(1991). 
[ 3] Henrissat B., Claeyssens M. p Tomme P., Lemesle L, Mornon J.-P. Gene 81:83-95(1989). 
[ 4] Henrissat B. Biochem. J. 280:309-316(1 991 ). 
30 [ 5] Alzari P.M., Souchon K, Dominguez R. Structure 4:265-275(1996). 

[0804] 269. Glycosyl hydrolases family 9 active sites signatures 

The microbial degradation of cellulose and xylans requires several types of enzymes such as endoglucanases (EC 
3.2.1.4 ), cellobiohydrolases (EC 3.2.1.91 ) (exoglucanases), or xylanases (EC 3.2.1.8) [1 ,2]. Fungi and bacteria produce 

35 a spectrum of cellulolytic enzymes (cellulases) and xylanases which, on the basts of sequence similarities, can be 
classified into families. One of these families is known as the cellulase family E [3] or as the glycosyl hydrolases family 
g [4 El] The enzymes which are currently known to belong to this family are listed below. - Butyrivibrio fibrisotvens 
cellodextrinase 1 (cedl). - Cellulomonas fimi endoglucanases B (cenB) and C (cenC). - Clostridium cellulolyticum 
endoglucanase G (celCCG). - Clostridium cellulovorans endoglucanase C (engC). - Clostridium stercoararium endog- 

40 lucanase Z (avicelase I) (celZ). - Clostridium thermocellum endoglucanases D (celD), F (celF) and I (cell). - Fibrobacter 
succinogenes endoglucanase A (endA). - Pseudomonas fluorescens endoglucanase A (celA). - Streptomyces reticuli 
endoglucanase 1 (cell). - Thermomonospora fusca endoglucanase E-4 (celD). - Dictyostelium discoideum spore ger- 
mination specific endoglucanase 270-6. This slime mold enzyme may digest the spore cell wall during germination, to 
release the enclosed amoeba. - Endoglucanases from plants such as Avocado or French bean. In plants this enzyme 

45 may bo involved the fruit ripening procoss. Two of the most conserved regions in these enzymes are centered on 
conserved residues which have been shown [5,6], in the endoglucanase D from Cellulomonas thermocellum, to be 
important for the catalytic activity. The first region contains an active site histidine and the second region contains two 
catalytically important residues: an aspartate and a glutamate. Both regions were used as signature patterns. 
[0805] Consensus pattern: [STV]-x-[L! VMFY]-[STV]-x(2)-G-x-[NKR]-x{4)-[PLI VM]-H-x-R [H is an active site residue]- 

50 Consensus pattern: [FYW]-x-D-x(4)-[FYW]-x(3)-E-x-[STA]-x(3)-N-ISTA] [D and E are active site residues]- 

[ 1] Beguin R Annu. Rev. Microbiol. 44:219-248(1990). 
- [ 2] Gilkes N.R., Henrissat B., Kilburn D.G., Miller R.C. Jr., Warren R.A.J. Microbiol. Rev. 55:303-315(1991). 
• [ 3] Henrissat B., Claeyssens M., Tomme P., Lemesle L, Mornon J.-P Gene 81:83-95(1989). 
55 [ 4] Henrissat B. Biochem. J. 280:309-316(1991). 

| 5] Tomme P., Chauvaux S. ( Beguin P., Millet J., Aubert J.-P, Claeyssens M. J. Biol. Chem. 266:10313-10318 
(1991). 

[ 6] Tomme P., van Beeumen J., Claeyssens M. Biochem. J. 285:319-324(1992). 


123 


:NSDOCID: <EP 1033405A2_L> 


EP 1 033 405 A2 


[0806] 270. Glyceraldehyde 3-phosphate dehydrogenase active site (gpdh) 

Glyceraldehyde 3-phosphate dehydrogenase (EC 1.2.1.12 ) (GAPDH) [1] is a tetrameric NAD-binding enzyme common 
to both the glycolytic and gluconeogenic pathways. A cysteine in the middle of the molecule is involved in lorming a 
covalent phosphoglycerol thioestor intermediate. The sequence around this cysteine is totally consorvod in oubactorial 
s and eukaryotic GAPDHs and is also present, albeit in a variant form, in the otherwise highly divergent archaebacterial , 
GAPDH [2].Escherichia coli D-erythrose 4-phosphate dehydrogenase (E4PDH) (gene epd orgapB) is an enzyme highly 
related to GAPDH [3]. 

[0807] Consensus pattern: |ASV]-S-C-[NT]-T-x(2)-[UM] [C is the active site residue]- 

70 [ 1] Harris J.I., Waters M. (In) The Enzymes (3rd edition) 13:1-50(1976). 

[ 2] Fabry S., Lang J.. Niermann T.. Vingron M.. Hensel R. Eur. J. Biochem. 179:405-413(1989). 
[3] Zhao G., Pease A.J.', Bharani N., Winkler M.E. J. Bacteriol. 177:2804-2812(1995). 

[0808] 271 . Granulins signature 
is G ranulins [ 1 ] are a family of cysteine-rich peptides of about 6 Kd which may have multiple biological activity. A precursor 
protein (known as acrogranin) potentially encodes seven different forms of granulin (grnA to grmG) which are probably 
released by post-translational proteolytic processing. A schematic representation of the structure of a granulin is shown 
below: 

xxxCxxxxxCxxxxxCCxxxxxxxxCCxxxxxxCCxxxxxCCxxxxxCxxxxxxCx *******. c » : conserved cysteine probably 
20 involved in a disulfide bond.'*': position of the pattern. Granulins are evolutionary related to a PMP-D1, a peptide 
extracted from thepars intercerebralis of migratory locusts [2]. 

[0809] Consensus pattern: C-x-D-x(2)-H-C-C-P-x(4)-C [The four C's are probably involved in disulfide bonds]- 

[ 1] Bhandari V, Palfree R.G., Bateman A. Proc. Natl. Acad. Sci. U.S.A. 89:1715-1719(1992). 
25 [ 2] Nakakura N., Hietter H., van Dorsselaer A. : Luu B. Eur. J. Biochem. 204:147-153(1992). 

[0810] 272. (HCV RdRp) Hepatitis C virus RNA dependent RNA polymerase 

[0811] The RNA dependent RNA polymerase is also known as non-structural protein NS5B. NS5B is a 65 kDa protein 

that resembles other viral RNA polymerases. HCV replication is thought to occur in membrane bound replication com- 
30 plexes. These complexes transcribe the positive strand and the resulting minus strand is used as a template for the 

synthesis of genomic RNA. There are two viral proteins involved in the reaction, NS3 and NS5B.[1 ,2] 

[0812] [1] Lohmann V, Korner F, Herian U, Bartenschlager R; 

J Virol 1997;71:8416-8428. [2] Behrens SE, Tomei L, De Francesco R; 

EMBO J 1996;15:12-22. [3] Ishido S, Fujita T, Hotta H; 
35 Biochem Biophys Res Commun 1 998;244: 35-40. 

[0813] 273. (HHH) Helix-hairpin-helix motif. 

[0814] [1] Doherty AJ, Serpell LC, Ponting CP; Nucleic Acids Res 1996;24:2488-2497. 
[0815] 274. HIT family signature 

Recently a family of small proteins of about 12 to 16 Kd has been described! 1]. This family currently consists of: - 
40 Mammalian protein HINT (also known as Protein kinase C inhibitor 1 or PKCI- 1). HINT was incorrectly thought to be 
a specific inhibitor of PKC. It has been shown to bind zinc. - Fission yeast diadenosine 5\5 m -P1,P4-tetraphosphate 
asymmetrical hydrolase (Ap4Aase) (EC 3.6.1 .17 ) [2] (gene aphl), which cleaves A-5'-PPPP- 5'A to yield AMP and ATP 
- FHIT, a human protein whose gone is altorod in difloront tumors and which acts [3] as a diadonosino 5 t ,5 n, -P'\ ,P3-tri- 
phosphate hydrolase (ApSAase) (EC 3.6.1.29 ) cleaving A-5'-PPP-5'A to yield AMP and ADP - Yeast proteins HNT1 
45 and HNT2. - Maize zinc-binding protein ZBP14 - Escherichia coli hypothetical protein ycfF. - Haemophilus influenzae 
hypothetical protein HI0961 . - Helicobacter pylori hypothetical protein HP0404. - Methanococcus jannaschii hypothet- 
ical protein MJ0866. - Mycobacterium leprae hypothetical protein U296A. - Synechocystis strain PCC 6803 hypothetical 
protein sin 234. - Caenorhabditis elegans hypothetical protein F21 C3.3. - A hypothetical 1 3.2 Kd protein in hisE 3'region 
in Azospirillum brasilense. - A hypothetical 13.1 Kd protein in p37 5'region in Mycoplasma hyorhinis. - A hypothetical 
50 12.4 Kd protein in psbAM 5'region in Synochococcus strain PCC 7942. All those proteins contains a rogion with three 
clustered histidines. This region is responsible for the designation of this family: HIT, for 'HIstidineTriad [1]. This region 
was originally thought to be implied in the binding of a zinc ion but was later identified [4] as part of the alpha-phosphate 
binding site of a nucleotide-binding domain. As a signature pattern, the region of the histidine triad was selected. 
[0816] Consensus pattern: [NQA]-x(4)-[GAV]-x-[QF]-x-[LIVM]-x-H-[LIVMFYT]-H-[LIVMFT]-H-[LIVMF](2)-[PSGA]- 

55 

[ 1] Seraphin B. DNA Seq. 3:177-179(1992). I 
[ 2] Huang Y, Garrison P.N., Barnes LD. Biochem. J. 312:925-932(1995). 

[ 3] Barnes LD., Garrison PN. ( Siprashvili 2., Guranowski A., Robinson A.K., Ingram S.W., Croce CM., Ohta M., 
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HuebnerK. Biochemistry 35:11529-11535(1996). 

[ 4] Dronnor C, Garrison P., Giimour J., Poisach D., Ringe D., Petsko G.A.. Lowenstein J.M. Nat. Struct. Biol. 4: 
231-a38(1997). 

s 108171 275 Myc-lypo, 'holix-loop-helix' dimerization domain signature (HLH) 

A number of eukaryotic proteins, which probably are sequence specific DNA-binding proteins that act as transcription 
lactors share a conserved domain ot 40 to 50 amino acid residues. It has been proposed [1 ] that this domain is lormed 
ol two amphipathic holicos joined by a variable length linker region that could form a loop. This •helix-loop-helix' (HLH) 
domain mediates protein dimerization and has been lound in the proteins listed below [2.3.E1 ,E2]. Most ol these pro- 
10 teins Nave an extra basic region of about 1 5 amino acid residues that is adjacent to the HLH domain and specifically 
binds to DNA They are relored as basic helix-loop-helix proteins (bHLH), and are classified in two groups: class A 
(ubiquitous) and class B (tissue-specific). Members ot the bHLH family bind variations on the core sequence 'CANNTG' 
nlso rolorrod to as Iho E-box motif. Tho homo- or hoterodimerization mediated by the HLH domain is independent of, 
but necessary for DNA binding, as two basic regions are required for DNA binding activity. The HLH proteins lacking 
rs the basic domain (Emc, Id) function as negative regulators since they form heterodimers. but fail to bind DNA. The 
tu.lry.iob.lfid pmlolno (hr.lry, E(opl), dor.dpnn) also repress (.inscription although thoy can bind DNA^The proteins of 
this subfamily act together with co-reprossor proteins, like groucho. through their C-termmal motif WRPW. - The myc 
family ot cellular oncogenes [4], which is currently known to contain four members: c-myc |E3], N-myc, L-myc, and B- 
mvc Tho myc gonos are thought to play a role in cellular differentiation and proliferation. - Proteins involved in myo- 
20 genesis (the induction of muscle cells). In mammals MyoD1 (Myt-3), myogenin (Myf-4), Myl-5, and Myf-6 (Mrf4 or 
herculin) in birds CMD1 (QMF-1), in Xenopus MyoD and MF25, in Caenorhabditis elegans CeMyoD, and in Drosoph.la 
nautilus nau) - Vertebrate proteins that bind specific DNA sequences ('E boxes 1 ) in various immunoglobulin chains 
enhancers' E2A or ITF-1 (E12/pan-2 and E47/pan-1), ITF-2 (tc!4), TFE3. andTFEB. - Vertebrate neurogenic differen- 
tiation factor 1 that acts as differentiation factor during neurogenesis. - Vertebrate MAX protein, a transcription regulator 
25 that forms a sequence- specific DNA-binding prolein complex with myc or mad. - Vertebrate Max Interacting Protein 
1 (MX1 1 protein) which acts as a transcriptional repressor and may antagonize myc transcriptional activity by compe ing 
for max - Protoins ol the bHLH/PAS superlamily which are transcriptional activators. In mammals, AH receptor nuclear 
m.nolociilor (ARNT), single-minded homologs (SIM1 and SIM2), hypoxia-inducible factor 1 alpha (HIF1A), AH receptor 
(AHR) neuronal pas domain proteins (NPAS1 and NPAS2), endothelial pas domain protein 1 (EPAS1 ), mouse ARNT2, 
30 and human BMAL1 . In drosophila, single-minded (SIM), AH receptor nuclear translocator (ARNT), trachealess prolein 
(TRH) and similar protein (SI MA). - Mammalian transcription factors HES, which repress transcription by acting on 
two types ot DNA sequences, the E box and the N box. - Mammalian MAD protein (max dimerizer) which acts as 
transcriptional repressor and may antagonize myc transcriptional activity by competing for max. - Mammalian Upstream 
Stlmululoiy Fhcioi 1 nnd 2 (USF1 i.nd USF2), which bind to a symmetrical DNA soquonco that is lound in a variety ot 
35 viral and cellular promoters. - Human lyl-1 protein; which is involved, by chromosomal translocation, in T- cell leukemia. 
- Human transcription factor AP-4. - Mouse helix-loop-helix proteins MATH-1 and MATH-2 which activate E box-de- 
pendent transcription in collaboration with E47. - Mammalian stem cell protein (SCL) (also known as tall), a protein 
which may play an important role in hemopoietic differentiation. SCL is involved, by chromosomal translocation, in 
stom-coll leukemia. - Mammalian proteins Id1 to Id4 (5]. Id (inhibitor of DNA binding) proteins lack a basic DNA-binding 
<o domain but are able to form heterodimers with other HLH proteins, thereby inhibiting binding to DNA. - Drosophila 
extra-macrochaetae (emc) protein, which participates in sensory organ patterning by antagonizing the neurogenic 
activity of the achaete- scute complex. Emc is the homolog of mammalian Id proteins. - Human Sterol Regulatory 
Element Binding Protein 1 (SREBP-1), a transcriptional activator that binds to the sterol regulatory element 1 (SRE- 
1) found in the flanking region of the LDLR gene and in other genes. - Drosophila achaete-scute (AS-C) complex 
<s proteins T3 (I'sc), T4 (scute), T5 (achaete) and T8 (asense). The AS-C proteins are involved in the determination of 
the neuronal precursors in the peripheral nervous system and the central nervous system. - Mammalian homologs ot 
achaete-scute proteins, the MASH-1 and MASH-2 proteins. - Drosophila atonal protein (ato) which is involved in neu- 
rogenesis - Drosophila daughterless (da) protein, which is essential for neurogenesis and sex^Jetermination. - Dro- 
sophila deadpan (dpn), a hairy-like protein involved in the functional differentiation of neurons. - Drosoph.la del.lah 
so (dei) protein which is plays an important role in the differentiation of epidermal cells into muscle. - Drosoph.la hairy 
(h) protein a transcriptional repressor which regulates the embryonic segmentation and adult bristle patterning. - Dro- 
sophila enhancer ol split proteins E(spl), that are hairy-like proteins active during neurogenesis, also act as transcrip- 
tional repressors. - Drosophila twist (twi) protein, which is involved in the establishment of germ layers in embryos. - 
Maize anthocyanin regulatory proteins R-S and LC. - Yeast centromere-binding protein 1 (CPF1 or CBF1). This protein 
ss is involved in chromosomal segregation. It binds to a highly conserved DNA sequence, found in centromers and in 
several promoters. - Yeast IN02 and IN04 proteins. - Yeast phosphate system positive regulatory protein PH04 which 
interacts with the upstream activating sequence of several acid phosphatase genes. - Yeast senne-nch protein TYE7 
that is required for ty-mediated ADH2 expression. - Neurospora crassa nuc-1 , a protein that activates the transcription 


125 


:NSDOCID: <EP 1033405A2_I_> 


EP 1 033 405 A2 


10 


is 


20 


25 


n J -™c..^p*.-J S ^«^5gr- 

rp=rrr a ^ 

soma, H6 (hWon. T * » HUG17 „„ s .,. c ,<,d. 

3-hydroxy-3-methylgluta^l<oenzyme A ly I ft fe g mitcxhondr.a enyme wh-ch .s |onate catab . 

,„ 8M) conson.us pane" 8-V- - Me „ d .. Ml ,.,,., LM, Sch.pper. 

TM Moving X""' f°" , h i msl slep in •» 01 ttuc !* '^rTi 3 2? (W™ "»« ««* 

ioa.pha- ""^'"^S 2 ^^,^"^ colaclo. ol ^.XLcocci* iannascM hypo- 

is i„«« in tm ?„ ° nom oci„al.. -Soybean We no* in J„ ,,„,, pa„e,n 5 lor lh.se 

and contains two conserved h s.d ne es.d ^ ^ 

108261 UKgCOA synt) Syd^ethy Igiutaryl^oenzymo A *£g£J^ S^cSSh acetoacety.-CoA 
[0827] 27a (HMS COA syn ) ^ Y ca(a)yzes the condensation « ^ Bubce „ ular compartments^ 

zyme A synthase (EC liMM^ ' are two isozymes loca led in ™J J^e , and otne r storoUc and 

toproduce HMG-CoAandCoMI Mn verteb ate way which leads to choleste^rol 

a cytosolic lorm which * responsible tor ketone ^^^^ucteophlle in the .irst 

sJresidue as a signature pattern. Hlv] . E . Grllvl -D-x(2).N-A.C-| F Y]-x-G JC J '•j^^^ **" 
[0828] Consensus ^ to ^^^fcjte E.A.. Sanyal O.. Cueto MA. Lacnan 
[0829] [ 1] Rokosz L.L., Boutton u.m.. 
Biochem B ophye. 312:1-13(1994). 
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[0831] 280 HSF-type DNA-binding domain signature 

[0832] Heat shock tactor (HSF) is a DNA-binding protein that specifically binds heat shock promoter elements (HSE). 
HSE is a palindromic element rich with repetitive purine and pyrimidine motifs: 5'-nGAAnnTTCnnGAAnnTTCn-3'. HSF 
is oxproslod at normal temperatures but is activated by hoat shock or chemical stressors [1 ,2). The sequences of HSF 

« limn vnilmm nnodoc, nhow oxlonnlvo olinllnrily in h rofiion ol about 90 amino acids, which has boen shown |3] to bind 
DNA Some olhor p.otoins also contain a HSF domain, those aro: - Yoasl SFL1, a protoin Involvod in coll surlace 
assembly and regulation of the gene related to flocculatioh (asexual cell aggregation) [4J. - Yeast transcription factor 
SKN7 (or BRY1 or POS9), which binds to the promoter elements SCB and MCB essential for the control of G1 cyclins 
expression [5]. - Yeasf MGA1. - Yeast hypothetical protein YJR147w."A pattern from the most conserved part of the 

to HSF DNA-bindinp domain was derived, its conlral region. 

[0833] Consensus pattern: L-x(3)-|FY]-K-H-X-N-x-|STAN]-S-F-[LIVM]-R-Q-L-[NH]-x-Y-x-[FYWHRKH]-K-[LIVM]- 

[ 1] Sorger P.K. Cell 65:363-366(1991). 

[ 2] Mager W.H., Moradas Ferreira P. Biochem. J. 290:1-13(1993). 
is l 3] Vuister G.W., Kim S.-J., Orosz A., Marquardt J., Wu C. Bax A. Nat. Struct. Biol. 1:605-613(1994). 

( 4] Fujita A., Kikuchi Y., Kuhara S., Misumi Y., Matsumoto S., Kobayashi H. Gene 85:321-328(1989). 
| 5] Morgan B.A., Bouquin N., Morrill G.F., Johnston L.H. EMBO J. 14:5679-5689(1995). ' 

[08341 281. Heat shock hsp20 proteins family profile 

20 Prokaryotic and eukaryotic organisms respond to heat shock or other environmental stress by induc.ng the synthes.s 
of proteins collectively known as heal-shock proteins (hsp) [1]. Amongst them is a family of proteins with an average 
molecular weight of 20 Kd, known as the hsp20 proteins [2 to 5]. These seem to act as chaperones that can protect 
other proteins against heat-induced donaturation and aggregation. Hsp20 proteins seem tolorm large heteroohgomer.c 
aggregates their family is currently composed of the following members: - Vertebrate heat shock protein hsp27 (hsp25), 

25 induced by a variety of environmental stresses. - Drosophila heat shock proteins hs P 22. hsp23, hsp26; hsp27, hsp67BA 
and BC - Caenorhabditis elegans hspl6 multigene family. - Fungal HSP26 (budding yeast) and hsp30 (Neurospora 
crassa and Aspergillus Nidulans). - Plant small hsp's. Plants have four classes of hs P 20: classes I and II which are 
cytoplasmic class III which is chloroplastic and class IV which is found in the endomembrane. - Alpha<:rystallin A and 
B chains Alpha-crystallin is an abundant constituent of the eye lens of most vertebrate species. Its main function 

30 appears to be to maintain the correct refradive index o1 the lens. It is also found in other tissues where it seems to act 
as a chaperone [6]. - Schistosoma mansoni major egg antigen P 40. Structurally, P 40 is built of two tandem hsp20 
domains - A variety of prokaryotic proteins: ibpA and ibpB from Escherichia coli. hs P 18 from Clostridium acetobutyl.- 
cum spore protein SP21 (hspA) from Stigmatella aurantiaca, Mycobacterium leprae 18 Kd anligen and Mycobacterium 
tuberculosis 14 Kd antigen. - Mothanococcus jannaschii hypothetical protoin MJ0285. Structurally, this famify is char- 

35 acterized by the presence of a conserved C-terminal domain ol about 100 residues. The profile developed to detect 
members of the hsp20 family is based on an alignment of this domain. 
[0835] -Sequences known to belong to this class detected by the profile: ALL. 

[ 1] Lindquist S . Craig E.A. Annu. Rev. Genet. 22:631 -677(1 988).[ 2] de Jong W W., Leunissen J.A.M.. Voorter C.E. 
M Mol Biol Evol 10:103-126(1993).[ 3] Caspers G.J., Leunissen J.A.M., de Jong W W. J. Mol. Evol. 40:238-248 
40 (1995) [ 4) Jaenicke R., Creighton T.E. Curr. Biol. 3:234-235(1993).[ 5] Jakob U.. Buchner J. Trends Biochem. Set. 19: 
205-211(1994).! 6] Groenen P.J.T.A., Merck K B., de Jong W.W., Bloemendal H. Eur. J. Biochem. 225:1-9(1994). 
[0836] 282. Heat shock hsp70 proteins family signatures 

[0837] Prokaryotic and eukaryotic organisms respond to heat shock or other environmental stress by the induction 
of the synthesis of proteins collectively known as heat-shock proteins (hsp) [1]. Amongst them is a lamily of proteins 

45 with an average molecular weight of 70 Kd, known as the hsp70proteins [2,3,4]. In most species, there are many 
proteins that belong to the hsp70 family. Some of them are expressed under unstressed conditions. Hsp70protems 
can be found in different cellular compartments (nuclear, cytosolic, mitochondrial, endoplasmic reticulum, etc.). Some 
of the hsp70 family proteinsare listed below: - In Escherichia coli and other bacteria, the main hsp70 protein is known 
as the dnaK protein. A second protein, hscA, has been recently discovered. dnaK is also found in the chloroplast 

so genome of red algae. - In yeast, at least ten hsp70 proteins are known to exisl: SSA1 to SSA4, SSB1. SSB2. SSC1, 
SSD1 (KAR2), SSE1 (MS13) and SSE2. - In Drosophila, there are at least eight different hsp70 proteins: HSP70, 
HSP68 and HSC-1 to HSC-6. - In mammals, there are at least eight different proteins: HSPA1 to HSPA6, HSC70, and 
GRP78 (also known as the immunoglobulin heavy chain binding protein (BiP)). • In the sugar beet yellow virus (SBYV), 
a hsp70 homolog has been shown [5] to exist. - In archaebacteria, hsp70 proteins are also present [6].AII proteins 

65 belonging to the hs P 70 family bind ATP. A variety of functions has been postulated for hsp70 proteins. It now appears 
[7] that some hsp70proteins play an important role in the transport of proteins across membranes. They also seem to 
be involved in protein folding and in the assembly/disassembly of protein complexes [8]. Three signature patterns tor 
the hsp70 family of proteins were derived; the first centered on a conserved pentapeptide found in the N-termmal 
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section of these proteins; the two others on conserved regions located in the central part of the sequence. 
[0838]. Consensus pattern: (IV]-D-L-G-T-[ST]-x-[SC] - 

Consensus pattern: [LIVMF]-[LIVMFYJ.[DNJ-|LIVMFS]-G-[GSH]-|GS]-[ASTJ-x(3)-(ST]-[LIVM]-[LIVMFC]- 
Consensus pattern: [LIVMY]-x-[LIVMF]-x-G-G-x-[ST]-x-[LIVM]-P-x-[LIVM]-x-[DEQKRSTA]- 

5 

[ 1] Lindquist S., Craig E.A. Annu. Rev. Genet. 22:631-677(1988). 
[ 2] Pelham H.R.B. Cell 46:959-961(1986). 

[ 3] Pelham H.R.B. Nature 332:776-77(1 988).[ 4] Craig E.A. BioEssays 11:48-52(1989). 
[ 5] Agranovsky A.A., Boyko VP, Karasev A.V, Koonin E.V.. Dolja VV J. Mol. Biol. 217:603-610(1991). 
io [ 6) Gupta R.S., Singh B. J. Bacteriol. 174:4594-4605(1992). 

[ 7] Deshaies R.J., Koch B.D., Schekmam R. Trends Biochem. Sci.: 13:384-388(1 988). 
[ 8] Craig E.A., Gross C. A. Trends Biochem. Sci. 16:135-140(1991). 

[0839] 283. Heat shock hsp90 proteins family signature 

is Prokaryotic and eukaryotic organisms respond to heat shock or other environmental stress by the induction of the 
synthesis of proteins collectively known as heat-shock proteins (hsp) [1]. Amongst them is a family of proteins, with 
an average molecular weight of 90 Kd, known as the hsp90proteins. Proteins known to belong to this family are: - 
Escherichia coli and other bacteria heat shock protein c62.5 (gene htpG). - Vertebrate hsp 90-alpha (hsp 86) and hsp 
90-beta (hsp 84). - Drosophila hsp 82 (hsp 83). - Trypanosoma cruzi hsp 85. - Plants Hsp82 or Hsp83. - Yeast and 

20 other fungi HSC82, and HSP82. - The endoplasmic reticulum protein 'endoplasmic (also known as Erp99 in mouse, 
GRP94 in hamster, and hsp 1 08 in chicken).The exact function ol hsp90 proteins is not yet known. In higher eukaryotes, 
hsp90 has been found associated with steroid hormone receptors, with tyrosine kinase oncogene products of several 
retroviruses, with elF2alpha kinase, and with actin and tubulin. Hsp90 are probable chaperonins that possess ATPase 
activity [2,3].As a signature pattern for the hsp90 family of proteins, a highly conserved region found in the N-terminal 

2S part of these proteins was selected. 

[0840] Consensus pattern: Y-x-(NQH]-K-[DE]-[IVA]-F-L-R-[ED] - 

[ 1] Lindquist S., Craig E.A. Annu. Rev. Genet. 22:631-677(1988). 
[ 2] Nadeau K., Das A., Walsh C.T. J. Biol. Chem. 268:1479-1487(1993). . 
30 [ 3] Jakob U. ( Buchner J. Trends Biochem. Sci. 19:205-211(1994). 

[0841] 284. Helix-turn-helix (HTH3) 

[0842] This large family of DNA binding helix-turn helix proteins includes Cro Swiss: P03036 and CI Swiss: P03034. 
[0843] 285. Heme oxygenase signature 

3$ Heme oxygenase (EC 1.14.99.3 ) (HO) |1 ] is the microsomal enzyme that, in animals, carries out the oxidation of heme, 
it cleaves the heme ring at the alpha methene bridge to form bitiverdin and carbon monoxide. Biliverdin is subsequently 
converted to bilirubin by biliverdin reductase. In mammals there are three isozymes of homo oxygonaso: HO-1 to HO- 
3. The first two isozymes differ in their tissue expression and their inducibility: HO-1 is highly inducible by its substrate 
heme and by various non-heme substances, while HO-2 is non-inducible. It has been suggested [2] that HO-2 could 

40 be implicated in the production of carbon monoxide in the brain where it is said to act as a nourotransmitter.ln the 
genome of the chloroplast of red algae as well as in cyanobacteria, there is a heme oxygenase (gene pbsA) that is the 
key enzyme in the synthesis ol the chromophobe part of the photosynthetic antennae [3]. An heme oxygenase is also 
present in the bacteria Corynebacterium diphtheriae (gone hmuO), where it is involved in the acquisition of iron from 
the host heme [4]. There is, in the central section of these enzymes, a well conserved region centered on a histidine 

45 residue which is proposed to play a key role in binding the substrate heme at the active center of the enzyme. This 
region was used as a signature pattern. 

[0844] Consensus pattern: L-[IV]-A-H-[STACH]-Y-[STV]-[RT]-Y-[LIVM]-G [H binds the heme] 

[ 1) Mainos M.D. FASEB J. 2:2557-2568(1988). 
50 [ 2] Barinaga M. Science 259:309-309(1993). 

I 3] Richaud C, Zabulon G. Proc. Natl. Acad. Sci. U.S.A. 94:11736-11741(1997). 
[ 4] Schmitt M.P. J. Bacteriol. 179:838-845(1997). 

[0845] 286. Hepatitis coro antigon. 
55 [0846] The core antigen of hepatitis virusos possossos a cHiboxyl terminus rich in fcirglnlno. On this basis It who 
predicted that the core antigen would bind DNA [1]. There is some experimental evidence to support this [2]. I 
[0847] [1] Pasek M, Goto T, Gilbert W, Zink B, Schaller H, Mckay P, Leadbetter G, Murray K; Nature 1979;282: 
575-579. [2] Gallina A, Bonelli F, Zentilin L ( Rindi G, Muttini M, Milanesi G; J Virol 1989;63:4645-4652. 
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r08481 287. Histidine biosynthesis protein ........ ... 

0849] Proteins involved in steps 4 and 6 of the histidine biosynthesis pathway are contained in this family. Histidine 
is formed) by several complex and distinct biochemical reactions catalysed by. eight enzymes. The enzymes in this 
Pfr.m ontry firo cfillod His6 and His7 in eukaryotos and HisA and HisF in prokaryotes. „■ 
l [0850] |1| Funl R. Twnburlnl E, Mori E, Lazct.no A, Lio P. Barborio C, Casalono E. Cavalieri D, Per.to B. Pols.nell. 
M. Gene 1997;197:9-17. |2) Fani R, Lio P, Chiarelli I. Bazzicalupo M. J Mol Evol 1994;38:489-495. 
[08S1] 288. Histone deacetylase family 

[0852] Histones can be reversibly acetylated on several lysine residues. Regulation of transcription is caused in part 
by this mechanism. Histone deacetylases catalyse the removal of the acetyl group. Histone deacetylases are related 

10 to olhor proteins 1 1], 

[0853] Loipo DD, Landsman D, Nucleic Acids Ros 1997;25:3693-3697. 
[0854] 289. Histidinol dehydrogenase signature 

Hislidinol dphydrogonaso fEC 1.1.1.23 ) (HDH) catalyzes the terminal step in the biosynthesis of histidine in bacteria. 

fungi and plants, the four-electron oxidation of L-histidinol to histidine.ln bacteria HDH is a single chain polypeptide; 
is in fungj jt i S the C-lerminal domain of a multifunctional enzyme which catalyzes three different steps of histidine bio- 

synthosis- and in plants it is expressed as nucloar encoded protein precursor which is exported to the chloroplast [1]. 

As a signaturo pattern a highly conserved region locatod in the central part of HDH was selected. This region does not 

correspond to the part of the enzyme that, in most, but not all HDH sequences contains a cysteine residue which, in 

Salmonella typhimurium, has been said [2] to be important lor the catalytic activity of the enzyme. 
20 [0855] Consensus pattern: |-D-x(2)-A-G-P-[ST]-E-[LIVS]-[LIVMA](3)-[AC]-x(3)-A-x(4)-[LIVM]-[AV]-[SACL]-[DEJ- 

[LIVMFC]-lLIVM]-[SA]-x(2)-E-H- 

l 1] Nagai A., Ward E., Beck J.. Tada S., Chang J.-Y, Schoidoggor A., Ryals J. Proc. Nail. Acad. Scl. U.S.A. 88: 
4133-4137(1991). 

ss [ 2} Grubmeyer C.T., Gray W,R. Biochemistry 25:4778-4784(1986). 

[0856] 290. Homoserine dehydrogenase signature 

Homosorino dehydrogenase (EC 1.1.1.3 ) (HDH) [1.2] catalyzes NAD-dependent reduction of aspartate beta-sem.al- 
dehyde into homoserine. This reaction is the third step in a pathway leading from aspartate to homoserine. The latter 

30 participates in the biosynthesis of threonine and then isoleucine as well as in that of methionine. HDh is found either 
as a single chain protein as in some bacteria and yeast, or as a bifunctional enzyme consisting of an N-term.nal as- 
partokinase domain and a C-terminal HDh domain as in bacteria such as Escherichia coli and in plants. As. a signature 
pattern, the best conserved region of Hdh has been selected. This is a segment of 23 to 24 residues located in the 
central section and that contains two conserved aspartate residues. 

35 [0857] Consensus pattern: A-x(3)-G-[LIVMFY]-[STAG]-x(2,3)-[DNS]-P-x(2)-D-[LIVM]-x-G- x-D-x(3)-K- 

[ 1] Thomas D., Barbey R.. Surdin-Kerjan Y. FEBS Lett. 323:289-293(1993). 
[ 2] Cami B., Clepet C, Patte J.-C. Biochimie 75:487-495(1993). 

40 [0858] 291. haloacid dehalogenase-like hydrolase 

[0859] This family is structurally different from the alpha/ beta hydrolase family (abhydrolase). This family includes 
L-2-haloacid dehalogenase, epoxide hydrolases and phosphatases. The structure of the family consists of two do- 
mains One is an inserted tour helix bundle, which is the least well conserved region of the alignment, between residues 
16 and 96 of Swiss: P24069. The rest of the fold is composed of the core alpha/beta domain. [1] Hisano T. Hata Y. Fujii 

is T. Liu JQ, Kurihara T, Esaki N. Soda K. J Biol Chem 1996; 271:20322-20330. 

[0860] 292 DEAD and DEAH box families ATP-dependent helicases signatures (helicase_C) 
A number of eukaryotic and prokaryotic proteins have been characterized [1,2.3] on the basis of their structural simi- 
larity They all seem to be involved in ATP-dependent, nucleic-acid unwinding. Proteins currently known to belong to 
this family are- - Initiation factor elF-4A. Found in eukaryotes. this protein is a subunit of a high molecular weight 

so complex involved in 5'cap recognition and the binding of mRNA to ribosomes. It is an ATP-dependent RNA-helicase. 

- PRP5 and PRP28. These yeast proteins are involved in various ATP-requiring steps of the pre-mRNA splicing process. 

- PI10 a mouse protein expressed specifically during spermatogenesis. - An3, a Xenopus putative RNA helicase, 
closely related to PI10. - SPP81/DED1 and DBP1, two yeast proteins probably involved in pre-mRNA splicing and 
related to PI10 - Caenorhabditis elegans helicase glh-1 . - MSS116, a yeast protein required for mitochondrial splicing. 

55 - SPB4 a yeast protein involved in the maturation of 25S ribosomal RNA. - P 68, a human nuclear antigen. p68 has 
ATPase and DNA-helicase activities in vitro. It is involved in cell growth and division. - Rm62 ( P 62). a Drosophila 
putative RNA helicase related to p68. - DBP2, a yeast protein related to p68. - DHH1 , a yeast protein. - DRS1 , a y ast 
protein involved in ribosome assembly. - MAK5, a yeast protein involved in maintenance of dsRNA killer plasmid. - 
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ROK1 , a yeast protein. - stel3, a fission yeast protein. - Vasa, a Drosophila protein important for oocyte formation and 
specification of embryonic posterior structures. - Me31B, a Drosophila maternally expressed protein of unknown func- 
tion. - dbpA, an Escherichia colt putative RNA helicase. - deaD, an Escherichia coli putative RNA hclicaso which can 
suppress a mutation in the rpsB gene for ribosomal protein S2. - rhIB, an Escherichia coli putative RNA helicase. - 

5 rhIE, an Escherichia coli putative RNA helicase. - srmB, an Escherichia coli protein that shows RNA-dependent ATPase 
activity. It probably interacts with 23S ribosomal RNA. - Caenorhabditis elegans hypothetical proteins T26G10.1, 
ZK512.2 and ZK686.2. - Yeast hypothetical protein YHR065c. - Yeast hypothetical protein YHR169w. - Fission yeast 
hypothetical protein SpAC31 A2.07c. - Bacillus subtilis hypothetical protein yxiN. All these proteins share a number of 
conserved sequence motifs. Some of them are specific to this family while others are shared by other ATP-binding 

io proteins or by proteins belonging to the helicases 'superfamily' I4.E11. One of these motifs, called the 'D-E-A-D-box\ 
represents a special version of the B motif of ATP-binding proteins. Some other proteins belong to a subfamily which 
have His instead of the second Asp and are thus said to be 'D-E-A-H-box' proteins [3,5,6,E1]. Proteins currently known 
to belong to this subfamily are: - PRP2, PRP16, PRP22 and PRP43. These yeast proteins are all involved in various 
ATP-requiring steps of the pre-mRNA splicing process. - Fission yeast prhl , which my be involved in pre-mRNA splicing. 

is - Male-less (mle), a Drosophila protein required in males, for dosage compensation of X chromosome linked genes. - 
RAD3 from yeast. RAD3 is a DNA helicase involved in excision repair of DNA damaged by UV light, bulky adducts or 
cross-linking agents. Fission yeast rad15 (rhp3) and mammalian DNA excision repair protein XPD (ERCC-2) are the 
homologs of RAD3. - Yeast CHL1 (or CTF1), which is important for chromosome transmission and normal coll cycle 
progression in G(2)/M. - Yeast TPSt. - Yeast hypothetical protein YKL078w. - Caenorhabditis elegans hypothetical 

20 proteins C06E 1.10 and K03H1 .2. - Poxviruses' early transcription factor 70 Kd subunil which acts with RNA polymerase 
to initiate transcription from early gene promoters. - 18, a putative vaccinia virus helicase. - hrpA, an Escherichia coli 
putative RNA helicase. Signature patterns were developed for both subfamilies. 
[0861] Consensus pattern: [LIVMF](2)-D-E-A-D-[RKEN]-x-[LIVMFYGSTN]- 
Consensus pattern: [GSAH]-x-[LIVMF](3)-D-E-[ALIV]-H-[NECR) - 

25 Note: proteins belonging to this family also contain a copy of the ATP/GTP- binding motif 'A* (P-loop) (see the relevant 
entry < PDOC00017 

[ 1] Schmid S.R., Under P. Mol. Microbiol. 6:283-292(1992). 

[ 2]LinderR, LaskoR, AshburnerM., LeroyP., Nielsen P.J., Nishi K., Schnier J., SlonimskiRP. Nature 337:121-122 
30 (1989). 

[ 3] Wassarman D.A., Steitz J.A. Nature 349:463-464(1991). 

[ 4] Hodgman T.C. Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata). 
( 5] Harosh I., Deschavanne P. Nucleic Acids Res. 19:6331-6331(1991). 
[ 6] Koonin E.V, Senkevich T.G. J. Gen. Virol. 73:989-993(1992). 

35 

[0862] 293. Heme-binding domain in cytochrome b5 and oxidoreductases (heme_1 ) 

[0863] Cytochrome b5 is a mombrano-bound homo protein which acts as an electron carrier lor oovoml mombmno- 
bound oxygenases [1]. There are two homologous forms of b5, one 1ound in microsomes and one lound in the outer 
membrane of mitochondria. Two conserved histidine residues serve as axial ligands for the heme group. The structure 
40 of a number of oxidoreductases consists of the juxtaposition of a heme-binding domain homologous to that of b5 and 
either a flavodehydrogenase or a molybdopterin domain. These enzymes are: 

Lactate dehydrogenase (EC 1.1.2.3 ) [2], an enzyme that consists of a flavodohydrogonaso domain and a homo- 
binding domain called cytochrome b2. 
^5 - Nitrate reductase (EC 1.6.6.1 ), a key enzyme involved in the first step of nitrate assimilation in plants, fungi and 
bacteria [3,4], Consists of a molybdopterin domain (see <PDOC00484>), a homo-binding domain callod cyto- 
chrome b557, as well as a cytochrome reductase domain. 

Sulfite oxidase (EC 1.8.3.1 ) [5], which catalyzes the terminal reaction in the oxidative degradation of sulfur-con- 
taining amino acids. Also consists of a molybdopterin domain and a heme-binding domain. 

50 

This family of proteins also includes: 

TU-36B, a Drosophila muscle protein of unknown function (6]. 
Fission yeast hypothetical protein SpAC1F1 2. 10c. 
55 - Yeast hypothetical protein YMR073C. 

Yeast hypothetical protein YMR272C ' 

[0864] A segment was used which includes the first of the two histidine heme ligands, as a signature pattern for the 
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heme-binding domain of cytochrome b5 family. 

[0865] Consensus pattern: [FY]-[LIVMK]-x(2)-H-P-[GA]-G [H is a heme axial ligand]- 

i 

|1] Ozols J. Biochim. Biophys. Acta 997:121-130(1989). 
& |2] Guinrd B. EMBO J, 1:3265-3272(1985), 

[3] Calza R. ( Huttner E., Vincentz M., Rouze P., Galangau R, Vaucheret H., Cherel I., Meyer C., Kronenberger J., 
Caboche M. Mol. Gen. Genet. 209:552-562(1987). 

[4] Crawford N.M., Smith M., Bellissimo D. ( Davis R.W. Proc. Natl. Acad. ScL U.S.A 85:5006-5010(1988). 
[5] Guiard B., Lederer F. Eur. J. Biochem. 100:441-453(1979). 
w [6] Levin R.J., Boychuk PL., Croniger CM., Kazzaz J.A.. Rozek C.E. Nucleic Acids Res, 17:6349-6367(1989). 

[0866] 294, Hexapeptide-repeat containing-transferases signature 

On the basis ot sequence similarity, a number of transferases have been proposed [1 ,2,3,4] to belong to a single family. 
These proteins are: - Serine acetyltransferase (EC 2.3.1 .30 ) (SAT) (gene cysE). an enzyme involved in cysteine bio- 

is synthesis. - Azotobacter chroococcum nitrogen fixation protein nifP NifP is most probably a SAT involved in the opti- 
mization ot nitrogonase activity. - Escherichia coli thiogalactoside acetyltransferase (EC 2.3.1.18) (gene lacA), an en- 
zyme- involvod in the biosynthesis ol lactose. - UDP-N-acotylglucosamino acyltransleraso (EC 2.3.1.129) (gene IpxA), 
an enzyme involved in the biosynthesis of lipid A, a phosphorylated glycolipid that anchors the lipopolysaccharide to 
tho outer mombmno of tho coll. - UDP-3-0-|3-hydroxymyristoyl] glucosamine N-acyltransferase (EC 2.3.1.-) (gene 

20 |pxD or firA), which is also involved in the biosynthesis of lipid A. - Chloramphenicol acetyltransferase (CAT) (EC 
2.3.1.28 ) from Agrobacterium tumefaciens, Bacillus sphaericus, Escherichia coli plasmid IncFII NR79, Pseudomonas 
aeruginosa, Staphylococcus aureus plasmid p!P630. These CAT are not evolutionary related to the main family of CAT 
(see < PDOC00093 >). - Rhizobium nodulation protein nodL. NodL is an acetyltransferase involved in the O-acetylation 
of Nod factors. - Bacterial maltose O-acetyltransterase (EC 2.3.1.79 ). - Bacterial tetrahydrodipicolinate N-succinyl- 

25 transferase (EC 2.3.1.117 ) (gene dapD) which catalyzes the fourth step in the biosynthesis of diaminopimelate and 
lysine from aspartate semialdehyde. - Bacterial N-acetylglucosamine-1 -phosphate uridyltransferase (EC 2.7.7.23 ) 
(gene gtmU or gcaD or tms), an enzyme involved in peptidoglycan and lipopolysaccharide biosynthesis. - Staphyloco- 
ccus aureus protein capG which is involved in biosynthesis of type 1 capsular polysaccharide. - Yeast hypothetical 
protein YJL218w, which is highly similar to Escherichia coli lacA. - Fission yeast hypothetical protein SpAC18B11 .09c. 

30 - Methanococcus jannaschii hypothetical protein MJ1064.These proteins have been shown [3,4] to contain a repeat 
structure composed of tandem repeats of a [LIV]-G-x(4) hexapeptide which, in the tertiary structure of IpxA [5], has 
been shown to form a left-handed parallel beta helix. Our signature pattern is based on a fourfold repeat of this hexa- 
peptide. 

[0867] Consensus pattern: [LIVMGAED]-x(2)-[STAV]-x-[LIV]-x(3)-[LIVAC]-x-[LIV]- [GAED]-x(2)-[STAVR]-x-[LIV]- 
35 [GAED]-x(2)-[STAV]-x-[LTV]- x(3)-[LIV]- 

[ 1] Downie J.A. Mol. Microbiol. 3:1649-1651(1989). 
[ 2] Parent R., Roy PH. J. Bacterid. 174:2891-2897(1992). 
| 3] Vaara M. FEMS Microbiol. Lett. 97:249-254(1992). 
AO [ 4] Vuorio R., Haerkonen T„ Tolvanen M., Vaara M. FEBS Lett. 337:289-292(1994). 

[ 5] Raetz C.R.H., Roderick S.L. Science 270:997-1000(1995). 

[0868] 295. Hexokinases signature. Hexokinase (EC 2.7.1.1 ) [1 ,2] is an important glycolytic enzyme that catalyzes 
the phosphorylation of keto- and aldohexoses (e.g. glucose, mannose and fructose) using MgATP as the phosphoryl 

46 donor. In vortobrnloo thoro are four major isoenzymes, commonly referred as types IJI, III and IV. Type IV hexokinase, 
which is often incorrectly designated glucokinase [3], is only expressed in liver and pancreatic beta-cells and plays an 
important role in modulating insulin secretion; it is a protein of a molecular mass of about 50 Kd. Hexokinases of types 
I to III, which have low Km values for glucose, have a molecular mass of about 100 Kd. Structurally they consist of a 
very small N-terminal hydrophobic membrane-binding domain followed by two highly similar domains of 450 residues. 

so The tirst domain has lost its catalytic activity and has evolved into a regulatory domain. In yeast there are three different 
isozymes: hexokinase PI (gene HXK1), Pll(gene HXKB), and glucokinase (gene GLK1). All three proteins have a 
molecular mass of about 50 Kd. All these enzymes contain one (or two in the case of types I to III isozymes)strongly 
conserved region which has been shown [4] to be involved in substrate binding. A pattern from that region has been 
derived 

55 [0869] Consensus pattern: [UVM]-G-F-[TN]-F-S-[FY]-P-x(5)-[LIVM]-[DNST]-x(3)-[LIVM]-x(2)-W-T-K-x-[LF]- 

[0870] [ 1] Middleton R.J. Biochem. Soc. Trans. 18: 180-1 83(1 990).[ 2] Griffin L.D.. Gelb B.D., Wheeler D.A, Davison 
D., Adams V., McCabe E.R. Genomics 11:1014-1024(1991).[ 3] Comish-Bowden A., Luz Cardenas M. Trends Biochem. 
Sci. 16:281 -282(1 991 ).[ 4] Schirch D.M., Wilson J.E. Arch. Biochem. Biophys. 254:385-396(1987). 
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[0871] 296. Histone H2A signature (nisi) 

Histone H2A is one of the four histones, along with H2B, H3 and H4, which forms the eukaryotic nucleosome core. 
Using alignments of histone H2Asequences [1 .2..E1] as a signature pattern, a conserved region in the N-terminal,part 
of H2A. This region is conserved both in classical S-phaso rogulated H2A's and in variant histono H2A's which are 
synthesized throughout the cell cycle. 
[0872] Consensus pattern: [AC]-G-L-x-F-P-V- 

[ 1] Wells D.E., Brown D. Nucleic Acids Res. 19:2173-2188(1991). 

( 2) Thatcher T.H., Gorovsky M A. Nucleic Acids Res. 22:174-179(1994). 


[0873] Histone H4 signature (his2) 

[0874] Histone H4 is one of the four histones, along with H2A, H2B and H3, which forms the eukaryotic nucleosome 
core. Along with H3, it plays a central role in nucleosome formation. The sequence of histone H4 has remained almost 
invariant in more then 2 billion years of evolution [1 JE1J. The region used as a signature pattern is a pentapeptide found 
1$ in positions 1 4 to 1 8 of all H4sequences. It contains a lysine residue which is often acotylatod [2] and a histidino residue 
which is implicated in DNA-binding [3]. 
[0875] Consensus pattern: G-A-K-R-H- 

[ 1] Thatcher T.H., Gorovsky M.A. Nucleic Acids Res. 22:174-179(1994). 
20 j 2] Doenecke D., Gallwitz D. Mol. Cell. Biochem. 44:113-128(1982). 

[ 3] Ebralidse K.K., Grachev S.A., Mirzabekov A D. Nature 331:365-367(1988). 

[0876] Histone H3 signatures (his3) 

Histone H3 is one of the four histones, along with H2A. H2B and H4, which forms the eukaryotic nucleosome core. It 
25 is a highly conserved protein of 135 amino acid residues [1 ,2 ( EJJ.The following proteins have been found to contain 
a C-terminal H3-like domain: - Mammalian centromeric protein CENP-A [3]. Could act as a core histone necessary for 
the assembly of centromeres. - Yeast chromatin-associaled protein CSE4 [4]. - Caenorhabdilis ologans chromosome 
III encodes two highly related proteins (F54C8.2 and F58A4.3) whose C-terminal section is evolutionary related to the 
last 100 residues of H3. The function of these proteins is not yet known. Two signature patterns were developed, The 
so first one corresponds to a perfectly conserved heptapeptide in the N-terminal part of H3. The second one is derived 
from a conserved region in the central section of H3. 
[0877] Consensus pattern: K-A-P-R-K-Q-L- 
Consensus pattern: P-F-x-[RA]-L-|VA]-[KRQ]-[DEG]-[IV]- 

35 [ 1] Wells D.E., Brown D. Nucleic Acids Res. 19:2173-2188(1991). 

[ 2] Thatcher T.H., Gorovsky M.A. Nucleic Acids Res. 22:174-179(1994). 

[ 3] Sullivan K.F., Hechenberger M., Masri K. J. Cell Biol. 127:581-592(1994). 

[ 4] Stoter S., Keith K.C., Curnick K.E., Fitzgerald-Hayes M. Genes Dev. 9:573-586(1995). 

40 [0878] Histone H2B signature (his4) 

[0879] Histone H2B is one of the four histones, along with H2A, H3 and H4, which forms the eukaryotic nucleosomo 
core. Using alignments of histone H2Bsequences [1.2JE1], a conserved region was selected in the C-terminal part 
ofH2B. 

[0880] Consensus pattern: [KR]-E-[LIVM]-[EQ]-T-x(2)-[KR]-x-|LIVM](2)-x-[PAG]-[DE]-L- x-|KR]-H-A-[LIVM]-(STA]- 

45 E-G- 

[ 1] Wells D.E., Brown D. Nucleic Acids Res. 19:2173-2188(1991). 

I 2] Thatcher T.H., Gorovsky M.A. Nucleic Acids Res. 22:174-179(1994). 

50 [0881] 297. 'Homeobox* domain signature and profile (homel) 

The 'homeobox' is a protein domain of 60 amino acids [1 to 5,EJJ first identified in a number of Drosophila homeotic 
and segmentation proteins. It has since been found to be extremely well conserved in many other animals, including 
vertebrates. This domain binds DNA through a helix-turn-helix type of structure. Some of the proteins which contain a 
homeobox domain play an important role in development. Most of these proteins are known to be sequence specific 

55 DNA-binding transcription factors. The homeobox domain has also been found to be very similar to a region of tho 
yeast mating type proteins. Theso aro soquenco-spocllic DNA-binding piotoine that act aa master bwilcltoti In yeast 
differentiation by controlling gene expression in a cell type-specific fashion, A schematic representation of tho home- 
obox domain is shown below. The helix-turn-helix region is shown by the symbols 'H' (for helix), and V (for turn). 
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xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxHHHHHHHHtttHHHHHHHHHxxxxxxxxxx 1 1 1 1 1 1 

1 10 20 30 40 50 60 The pattern to detect homeobox soquences that was developed is 24 residues long and spans 
£ positions 34 to 57 ol Iho homoobox domain. 

[0882] Consensus pattern; [LIVMFYG]-[ASLVR]-x(2)-[LIVMSTACN]-x-|LIVM]-x(4)-[LIV]-[RKNQESTAIY)-[LIVF- 

STNKH]-W-[FYVC]-x-[NDQTAH]-x<5)- [RKNAIMW] - 

| 1] Gehring W.J. (In) Guidebook to the homebox genes, Duboule D. , Ed., pp1 -10, Oxford University Press, Oxford, 
to (1994). 

I 2] Buerglin T.R. (In) Guidebook to the homebox genes, Duboule D., Ed., pp25-72, Oxford University Press, Oxford, 
(1994). 

| 3] Gehring W.J. Trends Biochem. Sci. 17:277-280(1992). 
| 4] Gehring W.J., Hiromi Y. Annu. Rev. Genet. 20:147-173(1986). 
is [ 5] Schofield P.N. Trends Neurosci. 10:3-6(1987). 

[0863] •Homoobox' nntonnupodln-typo protein olgnnluro (homo2) ' 

The homeotic Hox proteins are sequence-specific transcription factors. They are part of a developmental regulatory 

system that provides cells with specific positional identities on the anterior-posterior (A-P) axis [1]. The hox proteins 

20 contain a 'homeobox' domain. In Drosophila and other insects, there are eight different Hox genes that are encoded 
in two gene complexes, ANT-C and BX-C. In vertebrates there are 38 genes organized in four complexes. In six of the 
oifjht Dionophila Hox gonos tho homoobox domain is highly similar and a consorvod hexapeptide is found five to sixteen 
amino acids upstream of the homeobox domain. Tho six Drosophila protoins that belong to this group are antennapedia 
(Antp), abdominal-A (abd-A), deformed (Dfd), proboscipedia (pb).sex combs reduced (scr) and ultrabithorax (ubx) and 

25 are collectively known as the 'antennapedia* subfamily. In vertebrates the corresponding Hox genes are known [2] as 
Hox-A2, A3, A4.A5, A6. A7. Hox-B1, B2. B3, B4, B5. B6, B7, B8, Hox-C4, C5, C6, C8, Hox-D1.D3, D4 and 
D8.Cae'norhabditis etegans lin-39 and mab-5 are also members of the 'antennapedia' subfamily. As a signature pattern 
tor Ihio subfamily of homoobox proteins, the conserved hexapeptide was used. 
[0884] Consensus pattern: |LIVMFE]-[FY]-P-W-M-[KRQTA]- 

30 

[ 1] McGinnis W., Krumlauf R. Cell 68:283-302(1992). 
[ 2] Scott M.P. Cell 71:551-553(1992). 

[0885] 'Homeobox* engrailed-type protein signature (home3) 

35 [0886] Most proteins which contain a 'homeobox'domain can be classified [1 ,2], on the basis of their sequence char- 
acteristics, in three subfamilies: engrailed, antennapedia and paired. Proteins currently known to belong to the engrailed 
subfamily are: - Drosophila segmentation polarity protein engrailed (en) which specifies the body segmentation pattern 
and is required for the development of the central nervous system. - Drosophila tnvected protein (inv). - Silk moth 
proteins engrailed and invected, which may be involved in the compartmentalization of the silk gland. - Honeybee E30 

40 and E60. - Grasshopper (Schistocerca americana) G-En. - Mammalian and birds En-1 and En-2. - Zebrafish Eng-1,- 

2 and -3. - Sea urchin (Tripneusteas gratilla) SU-HB-en. - Leech (Helobdella triserialis) Ht-En. - Caenorhabditis elegans 
ceh-16.Engrailed homeobox proteins are characterized by the presence ol a conserved region of some 20 amino-acid 
residues located at the C-terminal of the 'homeobox' domain. As a signature pattern for this subfamily of proteins, a 
stretch of eight perfectly conserved residues in this region was used. 

45 [0887] Consensus pattern: L-M-A-[EQ]-G-L-Y-N- 

[ 1] Scott M.P., Tamkun J.W., Hartzeil G.W. Ill Biochim. Biophys. Acta 989:25-48(1989). 
[ 2] Gehring W.J. Science 236:1245-1252(1987). 

so [0888] 298. Isocitrate lyase signature (ICL) 

Isocitrate lyase (EC 4.1.3.1 ) [1 ,2] is an enzyme that catalyzes the conversion of isocitrate to succinate and glyoxylate. 

This is the first step in the glyoxylate bypass, an alternative to the tricarboxylic acid cycle in bacteria, fungi and plants. 

A -cysteine, a histidine and a glulamate or aspartate have been found to be important for the enzyme's catalytic activity 

Only one cysteine residue is conserved between the sequences of the fungal, plant and bacterial enzymes; it is located 
55 in the middle of a conserved hexapeptide that can be used as a signature pattern for this type of enzyme. 

[0889] Consonsus pattern: K-[KR]-C-G-H-[LMQ] [C is a putative active site residue]- 

[ 1] Beeching J.R. Protein Seq. Data Anal. 2:463-466(1989). 
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[ 2] Atorni K, Ueda M., Hikida M., Hishida T., Teranishi Y., Tanaka A. J. Biochem. 107:262-266(1990). 
[0890] 299. Initiation factor 2 subunit 

[0891] This family includes initiation factor 2B alpha, beta and delta subunits from eukaryotes, related proteins from 
archaebacteria and IF-2 from prokaryotes. Initiation factor 2 binds. to Met-tRNA, GTP and the small ribosomal subunit. 
[0892] [1] Kyrpides NC, Woese CR, Proc Natl Acad Sci U S A 1998;95:3726-3730. 
[0893] 300. Initiation factor 3 signature 

Initiation factor 3 (IF-3) (gene infC) [1] is one of the three factors required for the initiation of protein biosynthesis in 
bacteria. IF-3 is thought to function as a fidelity factor during the assembly of the ternary initiation complex which consist 
of the 30S ribosomal subunit, the initiator tRNA and the messenger RNA. IF-3 binds to the 30S ribosomal subunit; it 
is a basic protein of 141 to 212 residues. The chloroplast initiation factor IF-3(chl) is a protein that enhances the poly 
(A,U,G)-dependent binding of the initiator tRNA to chloroplast ribosomal30s subunits. In its mature form it is a protein 
of about 400 residues whose central section is evolutionary related to the sequence of bacterial IF-3 [2]. As a signature 
pattern a highly conserved region was selected located in the central section of bacterial IF-3 and of IF-3(chl). 
[0894] Consensus pattern: [KR]-[LIVM](2)-[DN]-[FY]-|GSN]-[KR]-[L1VMFYS]-x-[FY]-[DEQTH]-x(2)-[KRQ]- 

[ 1] Liveris D., Schwartz J.J., Geertman R., Schwartz I. FEMS Microbiol. Lett. 112:211-216(1993). 
[ 2] Lin Q., Ma L, Burkhart W., Spremulli L.L. J. Biol. Chem. 269:9436-9444(1994). 

[0895] 301. Imidazoleglycerol-phosphate dehydratase signatures (IGPD) 

Imidazoleglycerol-phosphate dehydratase (EC 4.2.1.19 ) is the enzyme that catalyzes the seventh step in the biosyn- 
thesis of histidine in bacteria, fungi and plants. In most organisms it is a monofunctionat protein of about 22 to29 Kd. 
In some bacteria such as Escherichia coli it is the C-terminal domain of a bif unctional protein that include a histidinol- 
phosphatase domain [1 ]. Two signature patterns were developed that each include two consecutive histidine residues. 
[0896] Consensus pattern: [LIVMY]-[DE]-x-H-H-x(2)-E-x(2)-fGCA]-[LIVM]-[STAC)-[LIVM]- 
Consensus pattern: G-x-[DN]-x-H-H-x(2)-E-[STAGC]-x-[FY]-K - 

[0897] [ 1] Carlomagno M.S., Chiariotti L., Alifano R, Nappo A.G., Bruni C.B. J. Mol. Biol. 203:585-606(1988). 
[0898] 302. lndole-3-glycerol phosphate synthase signature (IGPS) 

lndole-3-glycerol phosphate synthase (EC 4.1.1.48 ) (IGPS) catalyzes the fourth step in the biosynthesis of tryptophan: 
the ring closure of 1 -(2-carboxy-phenylamino)-1 -deoxyribulose into indol-3-glycerol-phosphate.ln some bacteria, IGPS 
is a single chain enzyme. In others - such as Escherichia coli - it is the N-terminal domain of a bilunctional onzymo 
that also catalyzes N-(5'-phosphoribosyl)anthranilate isomerase (PRAI) activity, the third step of tryptophan biosynthe- 
sis. In fungi, IGPS is the central domain of a trifunctional enzyme that also contains a PRAI C-terminal domain and a 
glutamine amidotransferase N-terminal domain. The N-terminal section of IGPS contains a highly conserved region 
which X-ray crystallography studies [1] have shown to be part of the active site cavity. This region was used as a 
signature pattern for IGPS. 

[0899] Consensus pattern: [LIVMFYHLIVMC]-x-E-|LIVMFYC]-K-|KRSPMSTAKl-S-P-lST].x(3)-|LIVMFYST]- 
[0900] [ 1] Wilmanns M., Priestle J.P, Niermann T., Jansonius J.N, J. Mol. Biol. 223:477-507(1992). 
[0901] 303. (IL2) Interleukin 2. 31 members 

[0902] 304. (ILVD EDD) Dihydroxy-acid and 6-phosphogluconate dehydratases. Two dehydratases have been 
shown [1] to be evolutionary related: - Dihydroxy-acid dehydratase (EC 4.2.1.9 ) (gene ilvD or ILV3) which catalyzes 
th fourth step in the biosynthesis of isoleucine and valine, the dehydratation of 2,3-dihydroxy-isovaleic acid into alpha- 
ketoisovateric acid. - 6-phosphogluconate dehydratase (EC 4.2.1.12 ) (gene edd) which catalyzes the first step in the 
Entner-Doudoroff pathway, the dehydratation of 6-phospho-D-gluconate into 6-phospho-2-dehydro-3-deoxy-D-gluco- 
nate. - Escherichia coli hypothetical protein yjhG. Both enzymes are proteins of about 600 amino acid residues. Two 
highly conserved regions have been developed as signature patterns. The first pattern is located in the N-terminal part 
and contains a cysteine that could be involved in the binding of a 2Fe-2S iron-sulfur cluster [2]. The second pattern is 
located in the C-terminal half. 

[0903] Consensus pattern: C-D-K-x(2)-P-[GA]-x(3)-[GA] [The C could bo a 2Fo-2S ligand] 
Consensus pattern: [SA]-L-[LIVM]-T-D-[GA)-R-[LIVMF]-S-[GA]-[GAV]-[ST]- 

[0904] [ 1] Egan S.E., Fliege R., Tong S., Shibata A., Wolf R.E. Jr., Conway T. J. Bacterid. 174:4638-4646(1992). 
[ 2] Velasco J.A., Cansado J., Pena M.C., Kawakami T, Laborda J., Notario V. Gene 137:179-185(1993). 
[0905] 305. IMP dehydrogenase / GMP reductase signature 

IMP dehydrogenase (EC 1.1.1.205) (IMPDH) catalyzes the rate-limiting reaction of de novo GTP biosynthesis, the 
NAD-dependent reduction of IMP into XMP [1]. Inhibition of IMP dehydrogenase activity results in the cessation ol DNA 
synthesis. As IMP dehydrogenase is associated with cell proliferation, it is a possible target for cancer chemotherapy. 
Mammalian and bacterial IMPDHs are tetramers of identical chains. There are two IMP dehydrogenase isozymes In 
humans [2]. GMP reductase (EC 1.6.6.8 ) catalyzes the irreversible and NADPH-dependent reductive deamination of 
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GMP into IMP [3] It converts nucleobase, nucleoside and nucleotide derivatives of G to A nucleotides, and maintains 
intracellular balance of A and G nucleotides. IMP dehydrogenase and GMP reductase share many regions of sequence 
similarity.lOne ol these regions is centered on a cysteine residue thought [3] to be involved in binding IMP This region 

WMfi llfiod /Ifi /I tii{|tWl1lllO |Mlll<)MI, _ 

l. L0906] ConGoiuiut: pi-iuoin: |LIVMJ.[RKHLIVM)-G-|LIVM|-G-x-G-S-|LIVM]-C-x.T [C is Iho pulal.vo IMP-b.nd.ng rosi- 
due]- 

[ 1] Collart F.R., Huberman E. J. Biol. Chem. 263:15769-15772(1988). 

| 2] Natsumeda Y., Ohno S., Kawasaki K, Konno Y, Weber G., Suzuki K. J. Biol. Chem. 265:5292-5295(1990). 
to | U| Andiowa a.C, Guotil J.\-\. IfJiochom. J. 200:313-43(1000), 

[0907] 306. (IPPc) Inositol polyphosphate phosphatase family, catalytic domain 
[0908] [1] York JD, Ponder JW, Chen ZW, Mathews FS, Majerus PW; 

Biochemistry 1994;33:13164-13171 . [2] Jefferson AB. Auethavekiat V, Pot DA, Williams LT, Majerus PW; J Biol Chem 
is 1997-272-5983-5988. [3] Zhang X, Jefferson AB, Auethavekiat V, Majerus PW; Proc Natl Acad Sci U S A 1995;92: 
40G3-4056. [4| York JD, Mnjoruo PW. Proc Natl Acad Sci USA 1990;87:9548-9552. [5] Nouwald AF, York JD, Majerus 
PW; 

FEBS Lett 1991;294:16-18. 

[0909] 307. IQ calmodulin-binding motif 

20 

|1] Xie X, Harrison DH, Schlichting I. Sweet RM, Kalabokis VN, Szent-Gyorgyi AG, Cohen C; Nature 1994;368: 
306-312. 

[2] Rhoads AR, Friedberg F; FASEB J 1997;11:331-340. 

25 [0910] 308 Inosine-uridine preferring nucleoside hydrolasefamily signature (IU nuc hydro) 

Inosine-uridine preferring nucleoside hydrolase (EC 3.2.2.1 ) (lU-nucleosidehydrolase or IUNH) is an enzyme first iden- 
tifiod in protozoan [11 thai catalyzes the hydrolysis of all of the commonly occuring purine and pyrimidme nucleosides 
inlo i Ibotio und Iho asGociatod base, but has a proloronco lor inosino and uridine as substrates. This enzyme is important 
for these parasitic organisms, which are deficient in de novo synthsis of purines, to salvage the host purine nucleos»des. 

30 IUNH from Crithidia fasciculata has been sequenced and characterized, it is an homotetrameric enzyme of subunits 
ol 34 Kd An histidine has been shown to be important for the catalytic mechanism, it acts a proton donor to activate 
the hypoxanthine leaving group. IUNH is evolutionary related to a number of uncharacterized proteins from various 
biological sources, notably: - Escherichia coli hypothetical protein yaaF - Escherichia coli hypothetical protein ybeK. 

- Escherichia coli hypothetical protein yeiK. - Fission yeast hypothetical protein SpAC17G8.02. - Yeast hypothetical 
35 protein YDR400w. - An hypothetical protein from the archaebacteria Desulf urolobus ambivalens. As a signature pattern 

for these proteins, a highly conserved region was selected located in the N-terminal extremity. This region contains 
four conserved aspartates that have been shown [2] to be located in the active site cavity. 
[0911] Consensus pattern: D-x-D-[PT]-IGA]-x-D-D-[TAV]-[VI]-A - 

AO [1] Gopaul D.N.. Meyer S.L., Degano M. f Sacchettini J.C., Schramm V.L. Biochemistry 35:5963-5970(1996). 

( 2] Degano M., Gopaul D.N., Scapin G. t Schramm V.L, Sacchettini J.C. Biochemistry 35:5971-5981(1996). 

[0912] 309. (Insulinase) 
Insulinase family, zinc-binding region signature 
45 (aka Poptidaso_M16) 

[0913] A number of proteases dependent on divalent cations for their activity have been shown [1,2] to belong to 
one family, on the basis of sequence similarity. These enzymes are listed below. 

- Insulinase (EC 3.4.24.56) (also known as insulysin or insulin-degrading enzyme or IDE), a cytoplasmic enzyme 
so which seems to be involved in the cellular processing of insulin, glucagon and other small polypeptides. 

- Escherichia coli protoaso til (EC 3.4.24.55) (pilrilysin) (gono ptr), a periplasmic enzyme that degrades small pep- 
tides. 4 . 
Mitochondrial processing peptidase (EC 3.4.24.64) (MPP). This enzyme removes the transit peptide from the pre- 
cursor form of proteins imported from the cytoplasm across the mitochondrial inner membrane. It is composed of 

ss two nonidentical homologous subunits termed alpha and beta. The beta subunit seems to be catalytically active 

while the alpha subunit has probably lost its activity. 

- Nardilysin (EC 3.4.24.61) (N-arginine dibasic convertase or NRD convertase) this mammalian enzym cleaves 
peptide substrates on the N-terminus of Arg residues in dibasic stretches. 
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[ 1] Neuwatd A.F., York J.D., Majerus RW. FEBS Lett. 294:16-18(1991). 

| 2] Glaeser H-U., Thomas D., Gaxiola R., Monlrichard F., Surdin-Kerjan Y, Serrano R. EMBO J. 12:3105-3110 
(1998). 

| 3 ] Bono Ft., Springer J.P., Atack J.R. Proc. Natl. Acad. Sci. U.S.A. 89:10031-10035(1992). 
5 , * 

[0924] 313. Ion transport protein 

[0925] This family contains Sodium, Potassium, Calcium ion channel This family is 6 transmembrane helices in which 
tho last two helices flank a loop which determines ion selectivity. In some sub-families (e.g. Na channels) the domain 
is repeated four times, whereas in others (e.g. K channels) the protein forms as a tetramer in the membrane. A bacterial 
io GUucturo of tho protein is known for tho last two helices but is not the Ptam family due to it lacking the first four helices 
[0926] 314. Isocitrate and isopropylmalate dehydrogenases signature (isodh) 

Isocitrate dehydrogenase (IDH) [1 ,2] is an important enzyme of carbohydrate metabolism which catalyzes the oxidative 
decarboxylation of isocitrate into alpha-ketoglutarate. IDH is either dependent on NAD+ (EC 1.1.1.41) or on NADP+ 
(EC 1.1.1.42 ). In eukaryotes there are at least three isozymes of IDH: two are located in the mitochondrial matrix (one 

is NAD-* -dependent, the other NADP+-dependont), while the third one (also NADP+-dependent) is cytoplasmic. In Es- 
cherichia coli tho activity ol a NADP » -dopondont form of tho onzymo is controlled by tho phosphorylation of a serine 
residue; the phosphorylated form of IDH is completely inactivated. 3- isopropylmalate dehydrogenase (EC 1.1 .'1.85) 
(IMDH)[3,4] catalyzes the third step in the biosynthesis of leucine in bacteria and fungi, the oxidative decarboxylation 
of 3-isopropylmalale into 2-oxo-4-methylvalerate. Tartrate dehydrogenase (EC 1.1.1.93) [5] catalyzes the reduction of 

20 tartrate to oxaloglycolate. These enzymes are evolutionary related [1,3,4,5]. The best conserved region of these en- 
zymes is a glycine-rich stretch of residues located in the C-terminal section. This region was used as a signature pattern. 
[0027] Cormonmm pnltom: [NS]-|LIMYT]-|FYDN].G-|DNT].|IMVY]-x.|STGDN]-iDN]-x(2)-lSGAP]-x(3,4)-G-[STG]- 
[LIVMPA]-G-[LIVMF]- 

25 l 1] Hurley J.H., Thorsness P.E.. Ramalingam V., Helmers N.H., Koshland D.E. Jr., Stroud R.M. Proc. Natl. Acad. 

Sci. U.S.A. 86:8635-8639(1989). 

[ 2] Cupp J.R., McAlisler-Henn L. J. Biol. Chem. 266:22199-22205(1991). 

| 3] Imacla K., Sato M., Tanaka N., Katsubo Y, Matsuura Y, Oshima T J. Mol. Biol. 222:725-738(1991). 
[ 4] Zhang T, Koshland D.E. Jr. Protein Sci. 4:84-92(1995). 
30 [ 5] Tipton PA., Beecher B.S. Arch. Biochem. Biophys. 313:15-21(1994). 

[0928] 315. Jacalin-like lectin domain. 

[0929] Proteins containing this domain are lectins. It is found in 1 to 6 copies in these proteins. The domain is also 
found in the animal prostatic spermine-binding protein (Swiss:P15501). 
35 [0930] [1] Sankaranarayanan R, Sekar K, Banerjee R, Sharma V t Surolia A, Vijayan M; Nat Struct Biol 1996;3: 
596-603. 

[0931] 316. KH domain 

[0932] KH motifs probably bind RNA directly. Auto antibodies to Nova, a KH domain protein, cause paraneoplastic 
opsoclonus ataxia. 


40 


[1] BurdCG, Dreyfuss G, Science 1994;265:615-621. 
[2] Musco G, Slier G, Joseph C, Castiglione Morelli MA, Nilges M, Gibson TJ, Pastore A, Cell 1996;85:237-245. 

[0933] 317. Kelch motif 

45 [0934] The kelch motif was initially discovered in Kelch (Swiss:Q04652 ). In this protein there are six copies of the 
motif. It has been shown that Swiss:Q04652 is related to Galactose Oxidase [1] for which a structure has been solved 
[2]. The kelch motif forms a beta sheet. Several of these sheets associate to form a beta propeller structure as found 
in neur, 

[0935] [1] BorkP, Doolittle RF, J Mol Biol 1 994; 236: 1277-1 282. [2] (to N, Phillips SE, Stevens C, OgelZB, McPherson 
so MJ, Keen, JN, Yadav KD, Knowles PF, Nature 1991;350:87-90. 

[0936] 318. Soybean trypsin inhibitor (Kunitz) protease inhibitors family signature 

[0937] The soybean trypsin inhibitor (Kunitz) family [1] is one of the numerous families of proteinase inhibitors. It 
comprise plant proteins which have inhibitory activity against serine proteinases from the trypsin and subtilisin families, 
thiol proteinases and aspartic proteinases as well as some proteins that are probably involved in seed storage. This 
ss family is currently known to group the following proteins: - Trypsin inhibitors A, B, C, KTI1, and KTI2 from soybean. - 
Trypsin inhibitor DE3 from coral beans (Erythrina sp.). - Trypsin inhibitor DE5 from sandal bead tree. - Trypsin inhibitors 
1A (WTI-1A), 1B (WTI-1B), and 2 (WTI-2) from goa bean. - Trypsin inhibitor from Acacia confusa. - Trypsin inhibitor 
from silk tree. - Chymotrypsin inhibitor 3 (WCI-3) from goa bean. - Cathepsin D inhibitors PDI and NDI from potato [2], 
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which inhibit both cathepsin D (aspartic proteinase) and trypsin. - Alpha-amylase/subtilisin inhibitors trom barley and 
wheat. - Albumin-1 (WBA-1) from goa boan soods [3]. - Miraculih trom Richadella dulcifica |4], a sweet taste protein. 
- Sporamin from sweet potato [5], the major tuberous root protein. - Thiol proteinase inhibitor PCPI 8.3 (P340) from 
potato tuber [6]. - Wound responsive protein gwin3 from poplar tree [7]. - 21 Kd seed protein Irom cocoa [8]. All these 
5 prot ins contain from 170 to 200 amino acid residues and one or twointrachain disulfide bonds. The best conserved 
region is found in their N-terminal section and is used as a signature pattern 
[0938] Consensus pattern: [LIVM]-x-D-x-[EDNTY]-[DG]-[RKHDENQ]-x-[LIVMJ-x(5)-Y-x-[LIVM] - 

[ 1] Laskowski M., Kato I. Annu. Rev. Biochem. 49:593-626(1980). 
io [ 2] Ritonja A., Krizaj I., Mesko R, Kopitar M., Lucovnik R, Strukelj B., Pungercar J., Buttle D.J., Barrett A.J., Turk 

V. FEBS Lett. 267:13-15(1990). 

[ 3] Korlt A.A., Strike P.M., de Jersey J. Eur. J. Biochem. 181:403-408(1989). 

[4] Theerasilp S., Hitotsuya H., Nakajo S., Nakaja K. t Nakamura Y, Kurihara Y J. Biol. Chem. 264:6655-6659 
(1989). 

is [ 5] Hattori T, Yoshida N. ( Nakamura K. Plant Mol. Biol. 13:563-572(1989). 

[ 6] Krizaj I., Drobnic-Kosorok M. ( Brzin J., Jerala R., Turk V. FEBS Lett. 333:15-20(1993). 

[ 7] Bradshaw H.D~ Hollick J.B., Parsons T.J., Clarke H.R.G., Gordon M.P. Plant Mol, Biol. 14:51-59(1989). 

[ 8] Tai H., McHenry L., Fritz P.J., Furtek D.B. Plant Mol. Biol. 16:913-915(1991). 

20 [0939] 319. Beta-ketoacyl synthases active site 

Beta-ketoacyl-ACP synthase (KAS) [1] is the enzyme that catalyzes the condensation of malonyl-ACP with the growing 
fatty acid chain. It is found as a component of the following enzymatic systems: - Fatty acid synthetase (FAS), which 
catalyzes the formation of long-chain fatty acids from acetyl-CoA, malonyl-CoA and NADPH. Bacterial and plant chlo- 
roplast FAS are composed ot eight separate subunits which correspond to different enzymatic activities; beta-ketoacyl 

2S synthase is one of these polypeptides. Fungal FAS consists of two multifunctional proteins, FAS1 and FAS2; the beta- 
ketoacyl synthase domain is located in the C-terminal section of FAS2. Vertebrate FAS consists of a single multifunc- 
tional chain; the beta-ketoacyl synthase domain is located in the N-terminal section [2]. - The multifunctional 6-meth- 
ysalicylic acid synthase (MSAS) from Penicillium patulum [3). This is a multifunctional enzyme involved in the biosyn- 
thesis of a polyketide antibiotic and which has a KAS domain in its N-terminal section. - Polyketide antibiotic synthase 

30 enzyme systems. Polyketides are secondary metabolites produced by microorganisms and plants from simple fatty 
acids. KAS is one of the components involved in the biosynthesis of the Stroptomycos polykotido antibiotics granatacin 
[4], tetracenomycin C [5] and erythromycin. - Emericella nidulans multifunctional protein Wa. Wa is involved in the 
biosynthesis of conidial green pigment. Wa is protein of 216 Kd that contains a KAS domain. - Rhizobium nodulation 
protein nodE, which probably acts as a beta-ketoacyl synthase in the synthesis of the nodulation Nod factor fatty acyl 

35 chain. - Yeast mitochondrial protein CEM1. The condensation reaction is a two step process: the acyl component of 
an activated acyl primer is transferred to a cysteine residue of the enzyme and is then condensed with an activated 
malonyl donor with the concomitant release ot carbon dloxido. Tho sequence Hround the active aito cysteine la woll 
conserved and can be used as a signature pattern. 

[0940] Consensus pattern: G-x(4)-[LIVMFAP)-x(2)-(AGC]-C-[STA](2)-[STAG]-x(3)-[LIVMF] [C is the active site resi- 
de due) 

[ 1] Kauppinen S., Siggaard-Andersen M. ( von Wettstein-Knowles P. Carlsberg Res. Commun. 53:357-370(1988). 
[ 2] Witkowski A., Rangan V.S., Randhawa Z.I., Amy C.M. t Smith S. Eur. J. Biochom. 198:571-579(1991). 
[ 3] Beck J., Ripka S., Siegner A., Schiltz E., Schweizer E. Eur. J. Biochem. 192:487-498(1990). 
45 [ 4) Bibb M.J., Biro S., Motamedi H, Collins J.F., Hutchinson C.R. EMBO J. 8:2727-2736(1989). 

[ 5] Sherman D.H., Malpartida F, Bibb M.J., Kieser H.M., Bibb M.J., Hopwood D.A. EMBO J. 8:2717-2725(1989). 

[0941] 320. Kinesin motor domain signature and profile 

Kinesin [1,2,3] is a microtubule-associated force-producing protein that mayplay a rolo In organollo transport. Klnosln 
so is an oligomeric complex composedof two heavy chains and two light chains. The kinesin motor activity isdirected 
toward the microtubule's plus end.The heavy chain is composed of three structural domains: a large globular N-terminal 
domain which is responsible for the motor activity of kinesin (it isknown to hydrolyze ATP, to bind and move on micro- 
tubules), a central alpha-helical coiled coil domain that mediates the heavy chain dimerization; and asmall globular C- 
terminal domain which interacts with other proteins (such astho kinesin light chains), vosiclos and membranous or- 
ss ganelles.A numb r of proteins have been recently found that contain a domain similarto that of the kinesin 'motor' 
domain [1,4,E1J: - Drosophila claret segregational protein (ncd). Ned is required for normal chromosomal segregation 
in meiosis, in females, and in early mitotic divisions of the embryo. The ncd motor activity is directed toward the mi- 
crotubule's minus end. - Drosophila kinesin-like protein (nod). Nod is required for th distributive chromosome segre- 


138 


EP 1 033 405 A2 


gation of nonexchange chromosomes during meiosis. - Human CENP-E [4]. CENP-E is a protein that associates with 
kinetochores during chromosome congression, relocates to the spindle midzone at anaphase, an : d is quantitatively 
discarded at the end of the cell division. CENP-E is probably an important motor molecule in chromosome movement 
ihiU/oi aplndlo olonflnlion. - Hunmn nillolks kinoolrvllko piololn-1 (MKLP-1), n motor protoin whoso nctivily is diroctod 

5 toward the microtubule's plus end. - Yeast KAR3 protein, which is essential for yeast nuclear fusion during mating. 
KAR3 may mediate microtubule sliding during nuclear fusion and possibly mitosis. - Yeast CIN8 and KIP1 proteins 
which are required for the assembly of the mitotic spindle. Both proteins seem to interact with spindle microtubules to 
produce an outwardly directed force acting upon the poles. - Fission yeast cut7 protein, which is essential for spindle 
body duplication during mitotic division. - Emericella nidulans bimC, which plays an important role in nuclear division. 

io - Emoiicollu nidulans klpA. - CHonorhabdltls ologans unc-104, which may be required for the transport ot substances 
needed for neuronal cell differentiation. - Caenorhabditis elegans osm-3. - Xenopus Eg5, which may be involved in 
mitosis. • Arabidopsis lhaliana KatA, KatB and katC. - Chiamydomonas reinhardtii FLA10/KHP1 and KLP1. Both pro- 
toins soom to play a role in the rotation or twisting of the microtubules of the flagella. - Caenorhabditis elegans hypo- 
thetical protein T09A5.2.The kinesin motor domain is located in the N-terminal part of most of theabove proteins, with 

is the exception of KAR3, klpA, and ncd where it is locatedin the C-lerminal section.The kinesin motor domain contains 
about 330 amino acids. An ATP-binding motifof typo A is lound near position 80 to 90, the C-terminal half of the domamis 
involved in microtubule^binding. The signature pattern for that domain isderived from a conserved decapeptide inside 
Iho microtubulo-binding part. 

Concensus pallom: |GSAJ.|KRHPSTQVMHLIVMF].x-|LIVMF]-|IVC]-D-L-|AH]-G-|SAN]-E 


20 


25 


I 1] Bloom G.S., Endow S.A. Protein Prof. 2:1109-1171(1995). 

[ 2] Vallee R.B., Shpetner H.S. Annu. Rev. Biochem. 59:909-932(1990). 

[ 3] Brady S.T. Trends Cell Biol. 5:159-164(1995). 

[ 4] Endow S.A. Trends Biochem. Sci. 16:221-225(1991).[E1] 


[0942] 321. Ribosomal protein L15 signature 

Ribosomal protein L1 5 is one ot the proteins from the large ribosomal subunil. In Escherichia coh, L1 5 is known to bind 
the 23S rRNA It belongs to a family of ribosomal proteins which, on the basis of sequence similarities [1], groups: - 
Eubacterial L15 - Plant chloroplast L15 (nuclear-encoded). - Archaebacterial L15. - Vertebrate L27a. - Tetrahymena 
30 thermophila L29. - Fungi L27a (L29, CRP-1 , CYH2).L1 5 is a protein of 1 44 to 1 54 amino-acid residues. As a signature 
pattern a conserved region was selected in the C-terminal section of these proteins. 
[0943] ' Consensus pattern: K -[LIVM]<2HGASLM^ 
(2)-A-x(3)-|LIVM]-x(3)-G 

[0944] 1 1] Otaka E., Hashimoto T, Mizuta K., Suzuki K. Protein Seq. Data Anal. 5:301-313(1993). 

35 [0945] 322. LBP / BPt / CETP family signature 

The following mammalian lipid-binding serum glycoproteins belong to the same family [1,2,3]: - Lipopolysacchar.de- 
binding protein (LBP) LBP binds to the lipid A moiety of bacterial lipopolysaccharides (LPS), a glycohpid present in 
the outer membrane of all Gram-negative bacteria. The LBP/LPS complex seems to interact with the CD14 receptor 
and may bo responsible tor the socrotion of alpha-TNF. - Bactericidal permeability-increasing protein (BPI). Like LBP, 

40 BP) binds LPS and has a cytotoxic activity on Gram-negative bacteria. - Cholesteryl esler transfer protein (CETP). 
CETP is involved in the transfer of insoluble cholesteryl esters in reverse cholesterol transport. - Phospholipid transfer 
protein (PLTP). May play a key role in extracellular phospholipid transport and modulation of HDL particles. These 
proteins are structurally related and share many regions of sequencesimilarities. As a signature pattern one of these 
regions was selected, which is located in the N-terminal section of these proteins; a region which could be involved in 

45 tho binding to the lipids [2]. 

Consensus pattern: [PA]-[GA]-[LI VMC]-x(2)-R-[IV]-[ST]-x(3)-L-x(5)-[EQ]-x(4)-[LIVM]-[EQK]-x(8)-P 

[ 1] Schumann R.R., Leong S.R., Flaggs G.W., Gray P.W., Wright S.D., Mathison J.C., Tobias PS., Ulevitch R.J. 
Science 249:1429-1431(1990). 
so i 2} Gray P.W., Flaggs G.. Leong S.R., Gumina R.J., Weiss J., Ooi C.E., Elsbach P. J. Biol. Chem. 264:9505-9509 

(1989). 

[ 3] Day J.R., Albers J.J., Lolton-Day C.E., Gilbert T.L., Ching A.F.T.. Grant F.J.. O'Hara P.J., Marcovma S.M., 
^ Adolphson J.L. J. Biol. Chem. 269:9388-9391(1994). 

55 [0946] 323. LIM domain signature and profile 

Recently [1 ,2] a number of proteins have been found to contain a conserved cysteine-rich domain of about 60 ammo- 
acid residues These proteins are: - Caenorhabditis elegans mec-3; a protein required for the differentiation of the set 
of six touch receptor neurons in this nematode. - Caenorhabditis elegans lin-11 ; a protein required for the asymmetric 
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division of vulval blast cells. - Vertebrate insulin gene enhancer binding protein isl-1. lsl-1 binds to one of the two cis- 
acting protein-binding domains of the insulin gene. - Vertebrate homeobox proteins lim-1, lim-2 (lrm-5) and Iim3. - 
Vertebrate lmx-1, which acts as a transcriptional activator by binding to the FLAT element; a beta-cell-specific tran- 
scriptional enhancer found in the insulin gene. - Mammalian LH-2, a transcriptional regulatory protein involved in the 

5 control of cell differentiation in developing lymphoid and neural cell' types. - Drosophila protein apterous, required for 
the normal development of the wing and halter imaginal discs. - Vertebrate protein kinases LIMK-t and LIMK-2. - 
Mammalian rhombotins. Rhombotin 1 (RBTN1 or TTG-1 ) and rhombotin-2 (RBTN2 or TTG-2) are proteins of about 
1 60 amino acids whose genes are disrupted by chromosomal translocations in T-cell leukemia. - Mammalian and avian 
cysteine-rich protein (CRP). a 192 amino-acid protein of unknown function. Seems to interact with zyxin. - Mammalian 

10 cysteine-rich intestinal protein (CRIP), a small protein which seems to have a role in zinc absorption and may function 
as an intracellular zinc transport protein. - Vertebrate paxillin, a cytoskeletal focal adhesion protein. - Mouse testin. 
Mouse testin should not be confused with rat testin which is a thiol protease homolog. - Sunflower pollen specific protein 
SF3. - Chicken zyxin. Zyxin is a low-abundance adhesion plaque protein which has been shown to interact with CRP. 
- Yeast protein LRG1 which is involved in sporulation [4]. - Yeast rho-type GTPase activating protein RGA1/DBM1. - 

*5 Caenorhabditis elegans homeobox protein coh-14. - Caonorhabditis ologans homeobox protein unc-97. - Yoast hypo- 
thetical protoin YKR090w. - Caonorhabditis ologans hypothetical protoins C2GH0.6.Thoso proteins gonomlly havo two 
tandem copies of a domain, called LIM (forLin-1 1 lsl-1 Mec-3) in their N-terminal section. Zyxin and paxillin areexcep- 
tions in that they contains respectively three and four LIM domains attheir C-terminal extremity. In apterous, isl-1 , LH- 
2, lin-11, lim-1 to Iim-3,lmx-1 and ceh-14 and mec-3 there is a homeobox domain some 50 to 95 amino acids after 

20 theLIM domains.ln the LIM domain, there are seven conserved cysteine residues and ahistidine. The arrangement 
followed by these conserved residues is C-x(2)-C-x(l6,23)-H-x(2)-[CH]-x(2)-C-x(2)-C-x(16,21)-C-x(2 ( 3)-|CHD]. The 
LIM domainbinds two zinc ions [5]. LIM does not bind DNA, rather it seems to act asinterface for protein-protein inter- 
action. A pattern was developed that spans the first half of the LIM domain. 

[0947] Consensus pattern: C-x(2)-C-x(1 5,21 )-[FYWH]-H-x(2)-[CH]-x(2)-C-x(2)-C-x(3)-[LI VMF] [The 5 C's and the H 
25 bind zinc] 

I 1] Freyd G., Kim S.K., Horvitz H.R. Nature 344:876-879(1990). 
[ 2] Baltz R., Evrard J.-L., Domon C, Steinmetz A. Plant Cell 4:1465-1466(1992). 
[ 3] Sanchez-Garcia I., Rabbitts T.H. Trends Genet. 10:315-320(1994). 
30 [ 4] Mueller A., Xu G., Wells R., Hollenberg C.R, Piepersberg W. Nucleic Acids Res. 22:3151-3154(1994). 

[5] Michelsen J.W., Schmeichel K.L, Beckerle M.C., Winge D.R. Proc. Natl. Acad, Sci. U.S.A. 90:4404-4408 
(1993). 

[0948] 324. (LRR) Leucine Rich Repeat 

3S CAUTION: This Pfam may not find all Leucine Rich Repeats in a protein. Leucine Rich Repeats are short sequence 
motifs present in a number of proteins with diverse functions and cellular locations. These repeats are usually involved 
in protein-protein interactions. Each Leucine Rich Repeat is composed of a beta-alpha unit. These units form elongated 
non-globular structures. Leucine Rich Repeats are often flanked by cysteine rich domains. Number of members: 3017 
[1] The leucine-rich repeat: a versatile binding motif. Kobe B, Deisenhofer J; Trends Biochem Sci 1994;19:415-421. 

40 [2] Crystal structure of porcine ribonuclease inhibitor, a protein with leucine-rich repeats. Kobe B, Deisenhofer J; Nature 
1993;366:751-756. 

[0949] 325. Plant lipid transfer protein family signature (LTP) 

[0950] Plant cells contain proteins, called lipid transfer proteins (LTP) [1 ,2,3], which are able to facilitate the transfer 
of phospholipids and other lipidsacross membranes. These proteins, whose subcellular location is not yet known, could 
45 play a major role in membrane biogenesis by conveying phospholipids such as waxes or cutin from their site of bio- 
synthesis to membranes unable to form these lipids. Plant LTP's are proteins of about 9 Kd (90 amino acids) which 
contain eight conserved cysteine residues all involved in disulfide bridges, as shown In the following schematic repre- 
sentation. 


so 


55 


+ + | + + | 1 1 | | **************** 

xCxxxxCxxxxxxCCxxxxxxxxCxCxxxxxxxxxxxCxxxxxxCxx 1 1 1 1 + | + | +- 

— - — + 

'C: conserved cysteine involved in n disulfide bond. 
'*': position of the pattern. 


140 


EP 1 033 405 A2 


15 


20 


35 


[0951] Consensus pattern: [LIVM]MPA]-x(2)-C-xMLIVM]-x-ILIVM]-x-[LIVMFY].x.[UVM].[STJ.x(3)-[DN]-C-x(2)- 

[LIVMJ [Tho two C's are involved in disulfide bonds] 

! • 
|1] Wirtz K.W.A. Annu. Rov. Biochom. 60:73-99(1991). 
[2] Arondol V, KadorJ.C, Exporionlia 46:579-585(1990). 

[3)Ohlrogge J.B., Browse J., Somerville C.R. Biochim. Biophys. Acta 1082:1-26(1991). 
[0952] 326. (LAMP) Lysosome-associated membrane glycoproteins signatures 

Lysosome-associated membrane glycoproteins (lamp) [1]are integral membrane proteins, specific to tysosomes, and 
whooo oxncl biological (unction in not yot clear. Sliuciurfilly, Iho lamp proteins consist ol two intornalty homologous 
lysosomo-lumlnal domains soparalod by a piollno-r Ich hinge region; at tho C-lorminal oxlromlty thore is a transmem- 
brane region followed by a very short cytoplasmic tail. In each of the duplicated domains, there are two conserved 
disulfide bonds. This structure is schematically represented in the figure below. 

n n + + + 111 1 1 II I 

xCxxxxxCxxxxxxxxxxxxCxxxxxCxxxxxxxxxCxxxxxCxxxxxxxxxxxxCxxxxxCxxxxxxxx 

< xHingex ><TMxC> 


In mammals, there aro two closely rotated typos of lamp: lamp-1 and lamp-2. In chicken lamp-1 is known as LEP100.The 
macrophage protein CD68 (or macrosialin) [2] is a heavily glycosylatedintegral membrane protein whose structure 
consists of a mucin-like domain followed by a proline-rich hinge; a single lamp-like domain; a transmembrane region 
25 and a short cytoplasmic tail. Two signature patterns for this family of proteins were developed. The first oneis centered 
on the first conserved cysteine of the duplicated domains. The second corresponds to a region that includes the ex- 
tremity of the second domain, the totality of the transmembrane region and the cytoplasmic tail. 
10953] Consensus pattern: |STA]-C-|LIVM]-[LIVMFYW]-A-x-[LIVMFYW]-x(3)-[LIVMFYW]-x(3)-Y [C is involved in a 

disulfide bond] - 
30 Consensus pattern: C-x(2)-D-x(3 f 4)-[LIVM](2)-P-!L^ 

[LIVM]-x(2)-[KR]-|RH]- x(1 ( 2)-[STAG](2)-Y-[EQ] (C is involved in a disulfide bond] 

| 1] Fukuda M. J. Biol. Chem. 266:21327-21330(1991). 

[ 2] Holnoss C.L., da Silva R.P., Fawcott J., Gordon S. ( Simmons D.L. J. Biol. Chem. 268:9661-9666(1993). 


[0954] 327. Lipolytic enzymes "G-D-S-L" family, serine active site 

[0955] Recently [1 ], a family of lipolytic enzymes has been characterized. This family currently consist of the following 
proteins: 

AO - Aoromonas hydrophila lipase/phosphatidylcholine-slerol acyltransferase. 
Xenorhabdus luminescens lipase 1. 
Vibrio mimicus arylesterase. 

Escherichia coli acyl-coA thioesterase I (gene tesA). 
Vibrio parahaemolyticus thermolabile hemolysin/atypical phospholipase. 
as - Rabbit phospholipase AdRab-B, an intestinal brush border protein with esterase and phospholipase A/lysophos- 
pholipase activity that could be involved in the uptake of dietary lipids. AdRab-B contains four repeats of about 
320 amino acids. 

Arabidopsis thaliana and Brassic napus anther-specific proline-rich protein APG. 

- A Pseudomonas putida hypothetical protein in trpE-trpG intergenic region. A serine has been identified a part of 
so the active site in the Aeromonas, Vibrio mimicus and Escherichia coli enzymes. It is located in a conserved se- 
quence motif that can be used as a signature pattern for these proteins. 

- - Consensus pattern: [LIVMFYAG](4)-G-D-S-[LIVM]-x(1 t 2)-[TAG]-G [S is the active site residue] 

55 [0956] 328. (Lipoprotein 4) Prokaryotic membrane lipoprotein lipid attachment site In prokaryotes, membrane lipo- 
proteins are synthesized with a precursor signal peptide, which is cleaved by a specific lipoprotein signal peptidase 
(signalpeptidaso II). The peptidase recognizes a conserved sequence and cuts upstreamot a cysteine residue to which 
a glyceride-fatty acid lipid is attached [1].Some of the proteins known to undergo such processing currently include 
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(forrecent listings see [1,2,3]): - Major outer membrane lipoprotein (murein-lipoproteins) (gene Ipp). - Escherichia coli 
lipoprotein-28 (gene nlpA). - Escherichia coli lipoproteins (gene nlpB). - Escherichia coli lipoprotein nlpC. - Es- 
cherichia coli lipoprotein nlpD. - Escherichia coli osmotically inducible lipoprotein B (gene osmB). Escherichia coli 
osmotically inducible lipoprotein E (gene osmE). - Escherichia coli peptidoglycan-associated lipoprotein (gene pal). - 

5 Escherichia coli rare lipoproteins A and B (genes rplA and rplB).. - Escherichia coli copper homeostasis protein cutF 
(or nlpE). - Escherichia coli plasmids traT proteins. - Escherichia coli Col plasmids lysis proteins. - A number of Bacillus 
beta-tactamases. - Bacillus subtilis periplasmic oligopeptide-binding protein (gene oppA). - Borrelia burgdorferi outer 
surface proteins A and B (genes ospA and ospB). - Borrelia hermsi* variable major protein 21 (gene vmp21) and 7 
(gene vmp7). - Chlamydia trachomatis outer membrane protein 3 (gene omp3). - Fibrobacter succinogenes endoglu- 

io canase cel-3. - Haemophilus influenzae proteins Pal and Pep, - Klebsiella pullulunase (gene pulA). - Klebsiella pullu- 
lunase secretion protein puis. - Mycoplasma hyorhinis protein p37. - Mycoplasma hyorhinis variant surface antigens 
A, B, and C (genes vIpABC). - Neisseria outer membrane protein H:8. - Pseudomonas aeruginosa Ijpopeptide (gene 
IppL). - Pseudomonas solanacearum endoglucanase egl. - Rhodopseudompnas viridis reaction center cytochrome 
subunit (gene cytC). - Rickettsia 17 Kd antigen. - Shigella ftexneri invasion plasmid proteins mxiJ and mxiM. - Strep- 

ts tococcus pneumoniae oligopeptide transport protein A (gene amiA). - Treponema pailidium 34 Kd antigen. - Treponema 
pailidium membrane protein A (gene tmpA). - Vibrio harveyi chitobiase (gene chb). - Yersinia virulence plasmid protein 
yscJ. - Halocyanin from Natrobacterium pharaon is [4], a membrane associated copper- binding protein. This is the 
first archaebacterial protein known to be modified in such a fashion). From the precursor sequences of ali these proteins, 
a consensus pattern and a set of rules to identify this type ol post-translational modification was derived. 

20 [0957] Consensus pattern: {DERK}(6)-[LIVMFWSTAGJ(2)-[LIVMFYSTAGCQ]-[AGS]-C [C is the lipid attachment 
site] Additional rules: 1 ) The cysteine must be between positions 15 and 35 of the sequence in consideration. 2) There 
must be at least one Lys or one Arg in the first seven positions of the sequence. 

[ 1] Hayashi S., Wu H.C. J. Bioenerg. Biomembr. 22:451-471(1990). 
25 [ 2] Klein P. Somorjai R.L., Lau RC.K. Protein Eng. 2:15-20(1988), 

| 3] von Heijne G. Protein Eng. 2:531-534(1989). 

( 4] Mattar S., Scharf B., Kent S.B.H., Rodewald K., Oesterhelt D., Engelhard M. J. Biol. Chem. 269:14939-14945 
(1994). 

30 [0958] 329. (lipoprotein 5) Prokaryotic membrane lipoprotein lipid attachment site. In prokaryotes, membrane lipo- 
proteins are synthesized with a precursor signal peptide, which is cleaved by a specific lipoprotein signal peptidase 
(signal peptidase II). The peptidase recognizes a conserved sequence and cuts upstream of a cysteine residue to 
which a glyceride-fatty acid lipid is attached [1].Some of the proteins known to undergo such processing currently 
include (for recent listings see [1,2,3]): - Major outer membrane lipoprotein (murein-lipoproteins) (gene Ipp). - Es- 

35 cherichia coli lipoprotein-28 (gene ntpA). - Escherichia coli lipoprotein-34 (gene nlpB). - Escherichia coli lipoprotein 
nlpC. - Escherichia coli lipoprotein nlpD. - Escherichia coli osmotically inducible lipoprotein B (gene osmB). - Escherichia 
coli osmotically inducible lipoprotein E (gene osmE). - Escherichia coli peptidoglycan-associated lipoprotein (gene pal). 
- Escherichia coli raro lipoproteins A find B (gonos rplA and rplB). - Eschorichifi colt coppor homoooUieis protoin cutF 
(or nlpE). - Escherichia coli plasmids traT proteins. - Escherichia coli Col plasmids tysis proteins. - A number of Bacillus 

40 beta-lactamases. - Bacillus subtilis periplasmic oligopeptide-binding protein (gene oppA). - Borrelia burgdorferi outer 
surface proteins A and B (genes ospA and ospB). - Borrelia hermsii variable major protein 21 (gene vmp21) and 7 
(gene vmp7). - Chlamydia trachomatis outer membrane protein 3 (gene omp3). - Fibrobacter succinogenes endoglu- 
canase col-3. - Haemophilus influenzae protoins Pal and Pep. ■ Klobsiolla pullulunase (gono pulA). - Klobslolla pullu- 
lunase secretion protein puis. - Mycoplasma hyorhinis protein p37. - Mycoplasma hyorhinis variant surface antigens 

45 a, B, and C (genes vIp ABC). - Neisseria outer membrane protein H.8. - Pseudomonas aeruginosa lipopeptide (gene 
IppL). - Pseudomonas solanacearum endoglucanase egl. - Rhodopseudomonas viridis reaction center cytochrome 
subunit (gene cytC). - Rickettsia 17 Kd antigen. - Shigella flexneri invasion plasmid proteins mxiJ and mxiM. - Strep- 
tococcus pneumoniae oligopeptide transport protoin A (gene amiA). - Troponoma pailidium 34 Kd antigen. - Treponema 
pailidium mombrano protoin A (gono ImpA). - Vibrio harvoyl chltoblnso (gono chb). - Yoralnla virulonco plnumid pioloin 

50 yscJ. - Halocyanin from Natrobacterium pharaonis [4], a membrane associated copper- binding protein. This is the first 
archaebacterial protein known to be modified in such a fashion). From the precursor sequences of all these proteins, 
a consensus pattern and a set of rules to identify this type of post-translational modification have been developed. 
[0959] Consensus pattern: {DERK}(6)-[LIVMFWSTAG](2)-[LIVMFYSTAGCQ]-[AGS]-C [C is the lipid attachment 
site] Additional rules: 1) The cysteine must be between positions 15 and 35 of the sequence in consideration. 2) There 

ss must be at least one Lys or one Arg in the first seven positions of the sequence. 

[0960] [ 1] Hayashi S., Wu H.C. J. Bioenerg. Biomembr. 22:451-471 (1990),[ 2] Klein P., Somorjai R.L., Lau P.b.K. 
Protein Eng. 2:1 5-20(1 988).[ 3] von Heijne G. Protein Eng. 2:531 -534(1 989).[ 4] Mattar S. ( Scharf B. t Kent S.B.H., 
Rodewald K., Oesterhelt D., Engelhard M. J. Biol. Chem. 269:14939-14945(1904). 
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[09701 Malatedehydrogena^^ 

EES dehydrogenase (f^^, ^ paltici ates in the isozymes: one which -s located 

NAD/NADH colactor system. The enzy p eukaryotlc celte there a a g, yo xysomal °«^' c £ 

orokaryotic organisms certain >a smg e to , asm . Fu ngi and plants a s ° n dent , orm ol MDH (EC 

5TS e mitochondria, mat* and the «h e n. ^ ^^J^ s P ec^edC4 eye e 

^TlS vS,SSo'vXa proton ^J^?^^ R are the activate res,- 
anism W an aspart,c ac d wh c ^ s T . lTRKMN] . L .D-x(2)-R-lSTAl-x(3) IUV 
[0971] Consensus pattern, ILIVM) i 
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J 41 Cendrin F.. Chroboczek J., Zacca. i*.. 

l09721 334. Legume lectins ^rj. . which ^ called leg ume lectins ™££*Z^ 

Leguminous plants synthes.ze ^^SS^ is n0t ^ *" T Te Tec Z bind a cium and manga- 

10 und in the seeds. The £ protection against P«-^^SiS^230 to 260 amino acid 

ni ,rogen-fixing barter- «^^"™^ lec ,ins are synthesized as precu ^ ° la cor(esponds to the 

nese (or other uansfton mtfato) L ^ um processe d to produce ^.^Xpt C nal in that the two chains 

residues. Some legume are J" dm y y anavalin A (conA) from lack bean .s except correspon ds 

Consensus pattern. IUV] x n 

•OWfl 305. CoA.,i 9 ». SS sy „,„„= S o *»: »- <=<* ""' 

Proteins which transport small "V^P™ clure architecture |1 to 5]. This is an ^ . has been proposed 

o, sequence homology and a f^^SXmal ligandbinding site p .3| The name provided tor 
barrel with a repeated + 1 'f^^TtlEnQ *> «* ^ are ^d porphyrin. Alpha-", -acid 
15] tor this protein 1am.* Pr^ns (protein HC). ^^^STpoSnda 16]. - Aphrodis.n 

ecently determined sequenc^V - Wpha m afray o1 natural and synthJ.c co P neme .r e .ated 

gly coprotein (»«»» 3 7^J^^ii^ pheromone. - Apolipopro «n D wNch P« V . Compl ennent 
which, in hamsters. a mS protein whose physiological ,u " c, '°" a a PP D Xin°rom lobster carapace, which 

compounds. - Beta-laclogtobuhn a mflKp ^ _ Crustacyan.n [8], a pra e n tr maturation, 

component C8 gamma cha.n wt,ch >™™ Mc acid binding protein ™ . ut0 Lacla , io n protein 

binds astaxanthin, a «^?^ and a related bu.ter.ly b -n- bind JU^gJ,,, (NGAL) (P 25) (SV-40 

.|nsectacyanin,amoth b,l.n-b.nd^g p Neutrophil gelat.nase-assoc.at « P rotln ol-binding proteins 

(LALP). a mlk ^ Spiin (OBP), which b ^i°f^% B ^X^ P rolein ' ' Pr ° S - 

induced 24 P 3 prote.n) 111]. - ^^^e'J^etrlal alpha-2 globulin. - P^ 3 , 51 ^' ^ Enzymatic activity 112]. - 
(PPBP). - Humar JP"^^ ^ ^sh dependent PGD «y^^^?^Ln chicken (embryo 
taglandin D synthase (EC 55i&j> J" , and neparin . . Quiescence specie P r °'^P sp 1 and2 , putal.ve 
pJrpurin. a retinal prote.n wh,ch ^ ^ ^ 2 . micr ' 0 g,obulin), which may b ^^°^T ottilein (VEGP) ]14] (also 
CH21 protein). -Rodent urinary ^^^oJa^ organ 113]. - Von ^^^^Oc^^, which 
pneromone transport prote.ns rom be involved in taste ^^^^supposeti lunction 

called tear lipocalin). a "^^^^^i*^ 1H.W o. the toad But . Meww w ha PP jy (lesr 
may transport odorants. - . prote n lound n the ^ p ^ . ^ ^J-^-^S, (1 B ,. - P.ok.uy- 

-, similar to transthyretin -n ^ 
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or two ol these |3,18], A signature pattern was built around the first, common to all outlier and kemallipocalins. which 

£5 °'aMern!^|DEN^ W-^YWLRH],- 

fi IllTto BUflflOBlod. on .ho buoio ol olmlbullioo o. ol.uclu.o. lunc.lpn. and eqquonco, Ihnl ^^^^1 
superfamily. called the calycins. with the avidin/streplavidin <PDOC00499> and the cy.osohc fatty- ac.d binding prote.ns 
<PDOC00188> families (3,19] 

| 1] Cowan S.W.. Newcomer M.E., Jones T.A. Proteins 8:44-61 (1990) S380M9921 
I 2| igamishi M., Nagnla A.. Toh H.. Urado H., Hayaishi N. Ptoc. Natl. Acad. Sci. U.S.A. 89.5376-5380(1992). 
[ 3] Flower D.R., North A.C.T., Attwood T.K. Protein Sci. 2:753-761(1993). 
[ 4] Godovac-Zimmermann J. Trends Biochem. Sci. 13:64-66(1988). 
[5] Porvai/S,, Brow K. FASEBJ. 1:209-214(1987). 

| 6] Kremer J.M.H.. Wilting J., Janssen L.H.M. Pharmacol. Rev. 40 ^l^ll\^ nQ ^. 
7] Haofligor J.-A.. Peitsch M.C., Jenne D., Tschopp J. Mol. Immunol. 28.123- 31(1991). A17MM1> 
[0] Koon J.N., CacoroB I., Eliopoulos E E., Zagalsky P.F.. Findlay J.B.C. Eur. J. B.ochem. 197.407-417(1991). 
I 9] Newcomer M E: Structure 1:7-18(1993). 

1101 Collet C Joseph R. Biochim. Biophys. Acta 1167:219-222(1993). 

KjoWsonL., Johnson A.H., Sengelov H., Bor.egaard N. J. Biol. Chem. 268:10425-10432(1993). 
20 (121 Peitsch M.C., Boguski M.S. Trends Biochem. Sci. 16:363-363(1 991). 

13 Miyawaki A., Matsushita Y.R.. Ryo Y. Mikoshiba T. EMBO J. 13:5835-5842(1994). 

H41KockK Ahlors C, Schmalo H. Eur. J. Biochom. 221:905-916(1994). oct- 
[iq AchenMG., Harms P.J.. Thomas T. Richardson S.J., Wettenhat. R.E.H.. Schreiber G. J. B.o.. Chem. 267. 

23170-23174(1992). 

[16] Morol L, Dularro J,P.. Dopoiges A. J. Biol. Chem. 268:102^-10281(1993). ^^ggj, 
17 Bishop R.E., Penfold S.S.. Frost L.S., Holtje J.V., Werner J.K ,l B.ol Che %27° 230^^^ 
|18] Flower D.R., North ACT.. Attwood T.K. Biochem. Biophys. Res. Commun. 180.69-74(1991). 
|19] Flowoi D R. FEBS Lotl. 333.99-102(1993). 

30 [0982] Cytosolic fatty-acid binding proteins signature (Iip2) „ r/5 „„, in tho ^^ a \ 

A nurnber of low molecular weight proteins which bind fatty acids and other organ* '^xXTl 
ll 21 Most of them are structurally related and have probably diverged from a common ancestor. This structure is a 
e strande ^an 'para el beta-barre'l, albeit with a wide discontinuity between the fourth and fifth «T*^%«Z%? 

? fopology enclosing an interna, ligand binding site |2,7]. Proteins known to belong to this fam,.y include. - So. ssue- 
soecific types of fatty acid binding proteins (FABPs) found in liver, intestine, heart, epidermal, adipocyte, bra n/ret na. 
hS FAB?" L known as mammary-derived growth inhibitor (MDGI), a protein that rev^sibly - ° 

of mammary carcinoma cells. Epidermal FABP is also known as psoriasis-associated FABP [3] - n * ec < ™ sc '° JJ* 
acid-binding proteins - Testis lipid binding protein (TLBP). - Cellular ret.nol-b,nd,ng prote.ns I and - Cemrtw 

re ino,racid-binding protein (CRABP). - Gastrotropin, an Heal protein which stimulates gastric acid and Peps.noQen 
sec"e ion It seems that gastrotropin binds to bile salts and bilirubins. - Fatty acid binding prote.ns MFB1 and MFB2 
romTe midgut, the ins'ec, Manduca sexta [4]..n addition to the above cytosolic proteins th.s lam,* a so ,nc udes^- 
MvTlin P2 protein which may be a lipid transport protein in Schwann cells. P2 .s assoc.ated w.th the lipid b.layer of 
2e n ScSsoma mansoni protein Sm14 [5] which seems to be invo.ved in the transport of tatty acids. - Ascar.s 
SI P1 "T^Z I protein thS may play a rote in sequestering potentially toxic fatty acids and ^2*^ 
oroducto or Ihr.l may bo involved in tho maintenance ol the impermeable lipid layer of the eggshell. - H yP°\ ne, ' ca ' ,at * 
aS-btdrng ploteins F40F4.2, F40F4.3. F40F4.4 and ZK742.5 from Caenorhabditis olegans. As a signature pattern 
fnr thpse oroteins a seqment from the N-terminal extremity was use. 

[SSTSSir." pattern: l GSAIVK]-x- I FYW]-x-[L.VMF]-x(4)- ( NHG]-[FY H DE]-x-[LIVMFY]-[L«VM]-x(2)-[L.V- 

Notell'is suqqested on the basis ol similarities of structure, function, and sequence, that this family forms an overall 
S'^^^'o^yc'n.. with the .ipoc H ,in <PDOC001B7> and avidin/s.rcptavidin <PDOC00499 > families [6.7]. 

- [ 1]Bernier I., Jolles P. Biochimie 69:1127-1152(1987). , nQ1 , 

[ 2] Veerkamp J.H., Peeters R.A., Maatman R.G.H.J. Biochim. Biophys. Acta 108 : -24(1991). 
[ 3] Siegenthaler G. , Hotz R.. Chatellard-Gruaz D., Didierjean L. Hellman U., Saurat J.-H. Biochem. J. 302.363-371 

! Tsmith A F , Tsuchida k.. Hanneman E., Suzuki T.C., Wells M A. J. Biol. Chem. 267:380-384(1992). 
[ 5] Moser D., Tendler M.. Griffiths G., Klinkert M.-Q. J. Biol. Chem. 266:8447-8454(1991). 


35 


40 


AC 


145 


NSDOCID: <EP 1033405A2J_> 


EP 1 033 405 A2 


[ 6] Flower D.R., North A.C.T, Attwood T.K. Protein Sci. 2:753-761(1993). 
I 7] Flower D.R. FEBS Lett. 333:99-102(1993). 

[0984] 338. Lipoxygenases iron-binding region signatures 

5 Lipoxygenases (EC 1.13.11.-) are a class of iron-containing dioxygenases which catalyzes the hydroperoxidation of 
lipids, containing a cis,cis-1 ( 4-pentadiene structure. They are common in plants where they may be involved in a number 
of diverse aspects of plant physiology including growth and development, pest resistance, and senescence or respons- 
es to wounding [1]. In mammals a number of lipoxygenases isozymes are involved in the metabolism of prostaglandins 
and leukotrienes [2J. Sequence data is available lor the following lipoxygenases: - Plant lipoxygenases (EC 1 .13. 11 .12 ). 

to Plants express a variety of cytosolic isozymes as well as what seems [3] to be a chloroplast isozyme. - Mammalian 
arachidonate 5-lipoxygenase (EC 1.13.11.34 ). - Mammalian arachidonate 1 2-lipoxygenase (EC 1.13.11.31 ). - Mam- 
malian erythroid cell-specific 1 5-lipoxygenase (EC 1.1 3.1 1.33 ). The iron atom in lipoxygenases is bound by four ligands, 
three of which are histidine residues [4]. Six histidines are conserved in all lipoxygenase sequences, five of them are 
found clustered in a stretch of 40 amino acids. This region contains two of the threo zinc-ligands; the other histidinos 

is have been shown [5] to be important for the activity of lipoxygenases. As signatures for this family of enzymes two 
patterns in the region of the histidine cluster were selected. The first pattern contains the first three conserved histidines 
and the second pattern includes the fourth and the fifth. 

Consensus pattern: H-[EQ]-x(3)-H-x-[LM]-[NQRC]-[GST]-H-[LIVMSTAC](3)-E [The second and third H's bind iron]- 
[0985] Consensus pattern: [LIVMA]-H-P-[LIVM]-x-[KRG]-[LIVMF](2)-x-[AP]-H- 

20 

[ 1) Vick B.A., Zimmerman D.C. (In) Biochemistry of plants: A comprehensive treatise, Stumpf P.K., Ed., Vol. 9, 
pp.53-90, Academic Press, New- York, (1987). 

[ 2] Needleman P., Turk J., Jakschik B.A., Morrison A.R., Lelkowith J.B. Annu. Rev. Biochem. 55:69-102(1986). * 
[ 3] Peng Y.L., Shirano Y, Ohta H., Hibino T, Tanaka K., Shibata D. J. Biol. Chem. 269:3755-3761(1994). 
25 [ 4] Boyington J.C., Gaffney B.J.. Amzel L.M. Science 260:1482-1486(1993). 

[ 5] Steczko J., Donoho G.P., Clemens J.C., Dixon J.E., Axelrod B. Biochemistry 31:4053-4057(1992). 

[0986] 339. Fumarate lyases signature (lyase_1) 

A number of enzymes, belonging to the lyase class, for which fumarate is a substrate have been shown [1 ,2] to share 
30 a short conserved sequence around a methionine which is probably involved in the catalytic activity of this type of 
enzymes. These enzymes are: - Fumarase (EC 4.2.1 .2 ) (fumarate hydralase), which catalyzes the reversible hydration 
of fumarate to L-malate. There seem to be 2 classes of fumarases: class I are thermolabile dimeric enzymes (as for 
example: Escherichia coli fumC); class II enzymes are thermostable and tetrameric and are found in prokaryotes (as 
for example: Escherichia coli fumA and f umB) as well as in eukaryoles, The sequence of the two classes of fumarases 
35 are not closely related. - Aspartate ammonia-Iyase (EC 4.3.1.1 ) (aspartase)^ which catalyzes the reversible conversion 
of aspartate to fumarate and ammonia. This reaction is analogous to that catalyzed by fumarase, except that ammonia 
rather than water is involved in the trans-elimination reaction. - Arginosuccinase (EC 4.3.2.1 ) (argininosuccinate lyase), 
which catalyzes the formation of arginine and fumarate from argininosuccinate, the last step in the biosynthesis of 
arginine. - Adenylosuccinase (EC 4.3.2.2 ) (adenylosuccinate lyase) [3], which catalyzes the eight step in the de novo 
to biosynthesis of purines, the formation of S'-phosphoribosyl-S-amino^-imidazolocarboxamido and fumamto from 1- 
(5-phosphoribosyl)-4-(N-succino-carboxamide). That enzyme can also catalyzes the formation of fumarate and AMP 
Irom adenylosuccinate. - Pseudomonas putida 3-carboxy-cis,cis-muconate cycloisomerase (EC 5.5.1.2 ) (3-carboxy- 
muconate lactonizing enzyme) (gene pcaB) [4], an enzyme involved in aromatic acids catabolism 
[0987] Consensus pattern: G-S-x(2)-M-x(2)-K-x-N- 

45 

[ 1) Woods S.A., Shwartzbach S.D., Guest J.R. Biochim. Biophys. Acta 954:14-26(1988). 
[ 2] Woods S.A., Miles J.S., Guest J.R. FEMS Microbiol. Lett. 51:181-186(1988). 
[ 3] Zalkin K, Dixon J.E. Prog. Nucleic Acid Res, Mol. Biol. 42:259-287(1992). 

[4] Williams S.E., Woolridge E.M., Ransom S.C., Landro J.A., Babbitt PC, Kozarich J.W. Biochemistry 31: 
SO 9768-9776(1992). 

[0988] 340. MCM family signature and profile 

Proteins shown to be required for the initiation of eukaryotic DNA replication share a highly conserved domain of about 
210 amino-acid rosiduos [1,2,3]. Tho lattor shows somo similnrMlos |4] wilh that of varlouo othor fnmilirm of DNA- 
ss dependent ATPases. Eukaryoles soem to possess a family of six protoins lhal contain this domain. Thoy woro lirol 
identified in yeast where most of them have a direct role in the initiation of chromosomal DNA replication by interacting 
directly with autonomously replicating sequences (ARS). They were thus called 'minichromosome maintenance pro- 
teins' with gene symbols prefixed by MCM. These six proteins ar : - MCM2, also known as cdc19 (in S.pombe) [El]. 
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- MCM3 also known as DNA polymerase alpha holoenzyme-associaled protein P1 , RLF beta subunit or ROA. - MCM4, 
also' known as CDC54, cdc21 (in S.pombe) or dpa (in Drosophila). - MCM5, also known as CDC46 or nda4 (in S. 
pornbe) * MCM6, also known as mis5 (in S.pombe). - MCM7, also known as CDC47 or Prohfera (in A.thaliana).This 
family is also present in archebactoria. In Methanococcus jannaschiithere are four members: MJ0363. MJ0961 , MJ1 489 
5 and MJECL1 3 The presence of a putative ATP-binding domain implies that these proteins maybe involved in an ATP- 
consuming step in the initiation of DNA replication in eukaryotes. As a signature pattern, a perfectly conserved region 
was scloctod that rcprosonts a special version of the B motif found in ATP-binding proteins. 
[0989] Consensus pattern: G-[IVT]-[LVAC](2)-[IVT]-D-[DE]-[FL]-[DNST] 

io I 1] Coxon A., Maundrell K.. Kearsey S.E. Nucleic Acids Res. 20:5571-5577(1992). 

[ 2] Hu B.. Burkhart R.. Schulto D„ Musahl C, Knippers R. Nucleic Acids Res. 21:5289-5293(1993). 

[ 3] Tye B.-K. Trends Cell Biol. 4:160-166(1994). 

| 4] Koonin E.V Nucleic Acids Res. 21:2541-2547(1993). 

is [0990] 341 Macrophage migration inhibitory factor family signature (MIF) 

A protein called macrophage migration inhibitory factor (MIF) [1] seems to exert an important role in host inflammatory 
responses It play a pivotal role in the host response to endotoxic shock and appears to serve as a pituitary "stress 
hormone that regulates systemic inflammatory responses. MIF is a secreted protein of 115 residues which is not proc- 
essed from a larger precursor. D-dopachromo tautomerase [2] is a mammalian cytoplasmic enzyme involved in melanin 

20 biosynthesis and that tautomerizes D-dopachrome with concomitant decarboxylation to give 5,6-dihydroxyindole (DHI). 
It is a protein of 117 rosiduos highly related to MIF. It must be noted that MIF binds glutathione and has been said to 
l>n mlntod lo (jtulnlhk.no G-uimnlnmnnn. Thin nnnnrlion hfift boon lntor disproved |3],Ae a signature pattorn tor those 
proteins, a conserved region was selected located in the central section. 
[0991] Consensus pattern: [DE]-P-C-A-x(3)-[LIVM]-x-S-l-G-x-[LIVM]-G- 


25 


30 


[ 1) Bucala R. Immunol. Lett. 43:23-26(1994). 

[ 2] Odh G., Hindemilh A., Rosengren A.-M., Rosengren E.. Rorsman H. Biochem. Biophys. Res. Commun. 197: 
619-624(1993). 

[ 3] Pearson W.R. Protein Sci. 3:525-527(1994). 


[0992] 342. Ml P family signature 

Recently the sequence of a number of diflerent proteins, that all seem to be transmembrane channel proteins, has 
boon found to bo highly rolatod |1 to 4].Thoso proteins are listod below. - Mammalian major intrinsic protein (MIP), MIP 
is the major component of lens fiber gap junctions. Gap junctions mediate direct exchange of ions and small molecule 

35 from one cell to another - Mammalian aquaporins [5]. These proteins form water-specific channels that provide the 
plasma membranes of red cells and kidney proximal and collecting tubules with high permeability to water, thereby 
permitting water to move in the direction of an osmotic gradient. - Soybean nodulin-26, a major component of the 
peribacteroid membrane induced during nodulation in legume roots after Rhizobium infection. - Plants tonoplast intrinsic 
proteins (TIP) There are various isoforms of TIP: alpha (seed), gamma, Rt (root), and Wsi (water-stress induced), 

40 These proteins may allow the diffusion of water, amino acids and/or peptides from the tonoplast interior to the cytoplasm. 
- Bacterial glycerol facilitator protein (gene glpF), which facilitates the movement of glycerol across the cytoplasmic 
membrane - Salmonella typhimurium propanediol diffusion facilitator (gene pduF). - Yeast FPS1, a glycerol uptake/ 
efflux facilitator protein. - Drosophila neurogenic protein 'big brain' (bib). This protein may mediate intercellular com- 
munication- it may functions by allowing the transport of certain molecules(s) and thereby sending a signal for an 

45 exodermai cell to become an epidermoblast instead of a neuroblast. - Yeast hypothetical protein YFL054c. - A hypo- 
thetical protein from the pepX region of lactococcus lactis. The MIP family proteins seem to contain six transmembrane 
segments Computer analysis shows that these protein probably arose by a tandem, intragenic duplication event from 
an ancestral protein that contained three transmembrane segments. As a signature pattern a well conserved region 
was selected which is located in a probable cytoplasmic loop between the second and third transmembrane regions. 

so [0993] Consensus pattern: [HNQA]-x-N-P-[STA]-[LIVMF]-[ST]-[LIVMF]-[GSTAFY]- 

[ 1] Reizer J., Reizer A., Saier M.H. Jr. CRC Crit. Rev. Biochem. 28:235-257(1993). 
^ [ 2] Baker M.E., Saier M.H. Jr. Cell 60:185-186(1990). 

[ 3] Pao G.M., Wu L-F., Johnson K.D., Hoette H.. Chrispeels M.J., Sweet G., Sandal N.N., Saier M.H. Jr. Mol. 
ss Microbiol. 5:33-37(1991). 

[ 4] Wistow G.J., Pisano M.M., Chepelinsky A.B. Trends Biochem. Sci. 16:170-171(1991). 
[ 5] Chrispeels M.J., Agre P. Trends Biochem. Sci. 19:421-425(1994). 
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[0994] 343. Mandelate racemase/ muconate lactonizing enzyme family signatures 

Mandelate racemase (EC 5.1.2.2 ) (MR) and muconate lactonizing enzyme(EC 5.5.1.1 ) (MLE) are two bacterial en- 
zymes involved in aromatic acid catabotism. They catalyze mechanistically distinct reactions yet they are related at 
the level of their primary, quaternary (homooctamer) and tertiary structures [1 ,2], A number of other proteins also soom 

5 to be evolutionary related to these two enzymes. These are: - The various plasm id-encoded chloromuconate cyclois- 
omerases (EC 5.5.1.7 ). - Escherichia coli protein rspA [3], rspA seems to be involved in the degradation of homoserine 
lactone (HSL) or of one of its metabolite. - Escherichia coli hypothetical protein ycjG. - Escherichia coli hypothetical 
protein yidU. - A hypothetical protein from Streptomyces ambofaciens [4]. Two signature patterns have been developed 
tor these enzymes; both contain conserved acidic residues. 

io [0995] The second pattern contains an aspartate and a glutamate which are ligands lor either a magnesium ion (in 
MR) or a manganese ion (inMLE). 

[0996] Consensus pattern: A-x-[SAGCN]-[SAG]-[LIVM]-[DEQ]-x-A-[LA]-x-[DE]-[LIA]-x-[GA)-[KRQ]-x(4)-[PSA)- 
[LIV]-x(2)-L-[LIVMF]-G- 

Consensus pattern: [LIVF]-x(2)-D-x-[NH]-x(7)-[ACL]-x(6)-[LIVMF]-x(7)-[LIVM]- E-[DENQ]-P [D and E bind a divalent 
is metal ion]- 

[ 1] Neidhart D.J., Kenyon G.L, Gerlt J. A., Petsko G.A. Nature 347:692-694(1990). 

[ 2] Petsko G.A., Kenyon G.L, Gerlt J.A., Ringe D., Kozarich J.W. Trends Biochem. Sci. 18:372-376(1993). 
[ 3] Huisman G.W., Kolter R. Science 265:537-539(1994)^ 
20 [ 4] Schneider D., Aigle B., Leblond P., Simonet J.M., Decaris B. J. Gen. Microbiol. 139:2559-2567(1993). 

[0997] 344. Merozoite Surface Antigen 2 (MSA-2) family 

[0998] Thomas AW, Carr DA, Carter JM, Lyon JA, Mol Biochem Parasitol 1990;43:211-220. 
[0999] 345. MSP (Major sperm protein) domain. 
2S [1000] Major sperm proteins are involved in sperm motility. These proteins oligomerise to form filaments. Partial 
matches to this domain are also found in other non MSP proteins. Those include Swiss:P40075 and Swiss:P34593. 
[1001] [1] Bullock TL, Roberts TM, Stewart M, J Mol Biol 1996;263:284-296. [2] King KL, Stewart M, Roberts TM, 
Seavy M, J Cell Sci 1992;101:847-857. 

[1002] 346. (Matrix) Viral matrix protein. Found in Morbillivirus and paramyxovirus, pneumovirus. Number of mem- 
30 bers: 105 

[1003] 347. O-methyltransf erase (methyltransf) 

[1004] This family includos a range of Omothyltransforasos. Those onzymos utiliso S-adonosyl mothionino. 
[1005] [1] Keller NP, Dischinger HC, Bhatnagar D, Cleveland TE, Utlah AH, Appl Environ Microbiol 1993;59:479-484. 
[1006] 348. Magnesium chelatase, subunit Chll 
35 [1007] Magnesium-chelatase is a three-component enzyme that catalyses the insertion of Mg2+ into protoporphyrin 
IX. This is the first unique step in the synthesis of (bacterio)chlorophyll. Due to this, it is thought that Mg-che!atase has 
an important role in channeling inter- mediates into the (bacterio)chlorophyll branch in response to conditions suitable 
lor photosynlhelic growth. Chll and BchD have molecular weight botwoon 38-42 kDa. 

[1008] [1] WalkerCJ, Willows FID, Biochem J 1997;327:321-333. [2] Petersen BL, Jensen PE, Gibson LC, Stummann 
40 BM, Hunter CN, Henningsen KW, J Bacterid 1998;180:699-704. 
[1009] 349. Plasmid recombination enzyme (Mob_Pre) 

[1010] With some plasmids, recombination can occur in a site specific manner that is independent of RecA. In such 
cases, the recombination event requires another protein called Pre. Pre is a plasmid recombination enzyme. This 
protein is: also known as Mob (conjugative mobilization). 
45 [1011] [1] Priebe SD, Lacks SA, J Bacteriol 1989;171:4778-4784. 
[1012] 350. Monooxygenase 

[1013] This family includes diverse enzymes that utilise FAD. 

[1014] [1] Gatti DL, Palfey BA, Lah MS, Entsch B, Massey V, Ballou DP, Ludwig ML, Science 1994;266:110-114. 
[1015] 351. Mov34 family 

50 [1016] Members of this family are found in proteasome regulatory subunits, eukaryotic initiation factor 3 (elF3) sub- 
units and regulators of transcription factors. 

[1017] [1] Aravind L, Ponting CP, Protein Sci 1998;7:1250-1254. [2] Hershey JW, Asano K ( Naranda T, Vornlocher 
HP, Hanachi P, Merrick WC, Biochimie 1996;78:903-907. 
[1018] 352. Myc amino-terminal region (Myc_N_term) 
55 [1019] The myc family belongs to the basic helix-loop-helix leucine zipper class of transcription factors, see HLH, 
Myc forms a heterodimer with Max, and this complex regulates cell growth through direct activation of genes involved 
in cell replication [2], 

[1020] [1] Facchini LM, Penn LZ, FASEB J 1998;12:633-651. [2] Grandorl C, Elsenman RIM, Trends Biochem Sci 
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1997;22:177-181. . 

[10211 353 (Metallothio 2) Metallothionein. Membersof this family are metallothioneins. These proteins are cysteine 
rich protejns that bind to heavy metals. Members ol this family appear to be closest to Class II metallothioneins, seed 

irtnti illtilo. Number of momboro: £35 , ..... . . ^ 

ii 1 10221 1 1 1 Modlino: 0G2G7202. Clwimcloiixnlion ol gono roportoiros fit maturo slago of citrus Iruils through random 
sequencing and analysis of redundant metallothionein-like genes expressed during fruit development. Moriguchi T, 
Kita M, Hisada S, Endo-lnagaki T, Omura M; Gene 1998;211:221-227. 
[10231 354. MAGE family 

[1024] The MAGE (melanoma antigen-encoding gene) family are expressed in a wide variety of tumors but not in 
w normal collo. with the oxcoplion of tho male gorm colls, placenta, and, possibly, colls of the developing embryo. The 
collular function ol this lamlly Is unknown. 

[1025] [1] McCurdy DK, Tai LQ, Nguyen J, Wang Z, Yang HM, Udar N. Naiem F. Concannon R Gatti RA; Mol Genet 

Motab 1998;63:3-13. t ^ . 

[1026] 355 Malic enzymes signature. Malic enzymes, or malate oxidoreductases, catalyze the oxidative decarbox- 

is yiation of malate inlo pyruvate important tor a wide range of metabolic pathways. There are three related forms of malic 
onzymo 11 2 3]' - NAD-dopondont malic onzymo (EC 1.1.1.30 ), which usos proforonlially NAD and has the ability to 
decarboxylase oxaloacetate (OAA). It is found in bacteria and insects. - NAD-dependent malic enzyme (EC 1.1.1.39 ). 
which uses preferentially NAD and is unable to decarboxylate OAA. It is found in the mitochondrial matrix of plants 
and is a holorodimor of highly related subunits. - NADP-dependenl malic enzyme (EC 1.1 .1 .40), which has a preference 

20 for NADP and has the ability to decarboxylate OAA. This form has been found in fungi, animals and plants. In mammals, 
there are two isozymes: one, mitochondrial and the other, cytosolic. Plants also have two isozymes: chloropiast.c and 
cytosolic Thoro aro two olhor proteins which are closely structurally related to malicenzymes: - Escherichia coll protein 
stcA whose function is not yet known but which could be an NAD or NADP-dependent malic enzyme. - Yeast hypo- 
thetical protein YKL029c, a probable malic enzyme. There are three well conserved regions in the enzyme sequences. 

25 Two of thorn soom to be involved in binding NAD or NADP. The significance of the third one, located in the central part 
of the enzymes, is not yet known. This region has been developed as a signature pattern for these enzymes. 
[1027] Consensus pattern: F-x.[DV]-D.x(2).G-T-[GSA]-x-tlV]-x-[LIVMAHGAST](2)-[LIVMF](2). 
[1028] I 1] Aituo N.N.. Edwards G.E. FEBS Lett. 162:225-233(1 985).[ 2] Loobor G.. Infante A.A., Maurer-Fogy I., 
Krystek E., Dworkin M.B. J. Biol. Chem. 266:3016-3021(1991). [ 3] Long J.J., Wang J.-L, Berry J.O. J. Biol. Chem. 

30 269:2827-2833(1994). 
[1029] 356. (matrixin) 

Matrixins cysteine switch (aka peptidase_M10) 

[1030] Mammalian extracellular matrix metalloproteinases (EC 3.4.24.-), also known as matnx.ns MM** 
<PDOC00129>) are zinc-dependent enzymes. They are secreted by cells in an inactive form (zymogen) that differs 
35 from the mature'enzyme by the presence of an N-terminal propeptide. A highly conserved octapeptide is lound two 
residues downstream of the C-terminal end of the propeptide. This region has been shown to be involved in auto.nhi- 
bition of matrixins [2,3]; a cysteine within the octapeptide chelates the active site zinc ion, thus inhibiting the enzyme. 
This region has been called the 'cysteine switch' or 'autoinhibitor region'. 
[1031] A cysteine switch has been found in the following zinc proteases: 

40 

MMP-1 (EC 3.4.24.7) (interstitial collagenase). 

- MMP-2 (EC 3.4.24.24) (72 Kd gelatinase). 

- MMP-3 (EC 3.4.24.17) (stromelysin-1). 
MMP-7 (EC 3.4.24.23) (matrilysin). 

4S - MMP-8 (EC 3.4.24.34) (neutrophil collagenase). 
MMP-9 (EC 3.4.24.35) (92 Kd gelatinase). 
MMP-10 (EC 3.4.24.22) (slromelysin-2). 

- MMP-1 1 (EC 3.4.24.-) (stromelysin-3), 

MMP-1 2 (EC 3.4.24.65) (macrophage metalloelastase). 
so - MMP-1 3 (EC 3.4.24.-) (collagenase 3). 

MMP 14 (EC 3.4.24.-) (membrane-type matrix metalliproteinase 1). 
MMP-1 5 (EC 3.4.24.-) (membrane-type matrix metalliproteinase 2). 
... MMP-1 6 (EC 3.4.24.-) (membrane-type matrix metalliproteinase 3). 

- • Sea urchin hatching enzyme (EC 3.4.24.12) (envelysin) [4]. 
ss - Chlamydomonas reinhardtii gamete lytic enzyme (GLE) [5]. 

[1032] Consensus patternP-R-C-[GN)-x-P-[DR]-[LIVSAPKQ] [C chelates the zinc ion] Sequences known to belong 
to this class detected by the pattern ALL, except for cat MMP-7 and mouse MMP-1 1 . 
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[ 1] Woessner J. Jr. FASEB J. 5:2145-2154(1991 ). 

,[ 2] Sanchez-Lopez R., Nicholson R., Gesnel M.C., Matrisian L.M., Breathnach R. J. Biol. Chem. 263:11892-11899 
(1988). 

[ 3] Park A.J., Matrisian L.M., Kells A.F., Pearson R., Yuan Z., Navre M. J, Biol. Chem. 266:1584-1590(1991). 
5 [ 4] Lepage T., Gache C. EMBO J. 9:3003-3012(1990). 

[ 5]KinoshitaT., Fukuzawa K, ShimadaT., SaitoT., Matsuda Y. Proc. Natl. Acad: Sci. U.S.A. 89:4693-4697(1992). 

[1033] 357. Vertebrate metallothioneins signature (metalthio) 

Metallothioneins (MT) [1,2,3] are small proteins which bind heavy metals such as zinc, copper, cadmium, nickel, etc., 
10 through clusters ot thiolate bonds. MT's occur throughout the animal kingdom and are also found in higher plants, fungi 
and some prokaryotes. On the basts of structural rotationships MT's have been subdivided into throo classes. Class I 
includes mammalian MT's as well as MT's from crustacean and molluscs, but with clearly related primary structure. 
Class II groups together MT's from various species such as sea urchins, fungi, insects and cyanobacteria which display 
none or only very distant correspondence to class I MT's. Class III MT's are atypical polypeptides containing gamma* 
is glutamytcysteinyl units. Vertebrate class I MT's are proteins of 60 to 68 amino acid residues, 20 of these residues are 
cysteines that bind to 7 bivalent metal ions. As a signature pattern a region that spans 19 residues and which contains 
seven of the metal-binding cysteines was chosen, this region is located in the N-terminal section of class-l MT's. 
[1034] Consensus pattern: C-x-C-[GSTAP]-x(2)-C-x-C-x(2)-C-x-C-x(2)-C-x-K- 

20 [ 1] Hamer D.H. Annu. Rev. Biochem. 55:913-951(1986). 

[ 2] Kagi J.H.R., Schaffer A. Biochemistry 27:8509-8515(1988). 
[ 3] Binz P.-A. Thesis, 1996, University of Zurich. 

[1035] 358. Mitochondrial energy transfer proteins signature (mito_carr) 

25 Different types of substrate carrier proteins involved in energy transfer are found in the inner mitochondrial membrane 
[1 to 5]. These are: - The AD P, ATP carrier protein (AAC) (ADP/ATP translocase) which exports ATP into the cytosol 
and imports ADP into the mitochondrial matrix. The sequence of AAC has boon obtained from various mammalian, 
plant and fungal species. - The 2-oxogIutarate/malate carrier protein (OGCP), which exports 2-oxoglutarate into the 
cytosol and imports malate or other dicarboxylic acids into the mitochondrial matrix. This protein plays an important 

30 role in several metabolic processes such as the malate/aspartate and the oxoglutarate/isocitrate shuttles. - The phos- 
phate carrier protein, which transports phosphate groups from the cytosol into the mitochondrial matrix. - The brown 
fat uncoupling protein (UCP) which dissipates oxidative energy into heat by transporting protons from the cytosol into 
the mitochondrial matrix. - The tricarboxylato transport protein (or citrato transport protoin) which is involvod in citrato- 
H+/malate exchange. It is important for the bioenergetics of hepatic cells as it provides a carbon source for fatty acid 

3S and sterol biosyntheses, and NAD for the glycolytic pathway. - The Grave's disease carrier protein (GDC), a protein 
of unknown function recognized by IgG in patients with active Grave's disease. - Yeast mitochondrial proteins MRS3 
and MRS4. The exact function of these proteins is not known. They suppress a mitochondrial splice defect in the first 
intron of the COB gene and may act as carriers, exerting their suppressor activity by modulating solute concentrations 
in the mitochondrion. - Yeast mitochondrial FAD carrier protein (gene FLX1). - Yeast protein ACR1 [6], which seems 

40 essential for acetyl-CoA synthetase activity. - Yeast protein PET8. - Yeast protein PMT. - Yeast protein RIM2. - Yeast 
protein YHM1/SHM1. - Yeast protein YMC1. - Yeast protein YMC2. - Yeast hypothetical proteins YBR291C, YEL006w ( 
YER053c, YFR045W, YHR002w, and YIL006w. - Caenorhabditis elegans hypothetical protein K11 H3.3.Two other pro- 
teins have been found to belong to this family, yet are not localized in the mitochondrial inner membrane: - Maize 
amyloplast Brittle-1 protein. This protein, found in the endosperm of kernels, could play a role in amyloplast membrane 

45 transport. - Candida boidinii peroxisomal membrane protein PMP47 |7]. PMP47 is an integral mombrano protoin of tho 
peroxisome and it may play a role as a transporter. These proteins all seem to be evolutionary related. Structurally, 
they consistof three tandem repeats of a domain of approximately one hundred rosidues. Each of those domains 
contains two transmembrane regions. As a signature pattern, one of the most conserved regions in the repeated domain 
was selected, located just after the first transmembrane region. 

so [1036] Consensus pattern: P-x-(DE]-x-|LIVAT]-|RKl-x-|LRHMLIVMFY]-|QGAIVM]- 

[ 1] Klingenberg M. Trends Biochem. Sci. 15:108-112(1990). 
[ 2] Walker J.E. Curr. Opin. Struct. Biol. 2:519-526(1992). 
[ 3] Kuan J,, Saier M.H. Jr. CRC Crit. Rev. Biochem. 28:209-233(1993). 
ss [ 4] Kuan J., Saier M.H. Jr. Res. Microbiol. 144:671-672(1993). 

I 6] Nolson D,R. ( Lnwoon J.E., KllnQgnberg M.». Douglne M.G, J. Mol, Dfol, 230;11DP*1170(1PPO), t 
[ 6) Palmieri F. FEBS Lett. 346:48-54(1 994). 

[ 7] Jank B., Habermann B. ( Schwoyen R.J., Link T.A. Trends Biochom. Sci, 18:427-420(1993). 
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[1037] 359. Prokaryotic molybdopterin oxidoreductases signatures (molybdopterin) 

A number of different prokaryotic oxidoreductases that require and bind amolybdopterin cofactor have been shown 
[1,2,3] tolshare a number of regions of sequence similarity. These enzymes are: - Escherichia coli respiratory nitrat 
roductnGO (EC 1.7.09.4 ). This onzymo complox allows tho bactoria to uso nitrate as an electron acceptor during anaer- 

6 obic growth. Tho onzymo is composed of three difloront chains: alpha, beta and gamma. The alpha chain (gene narG) 
is4he molybdopterin-binding subunit. Escherichia coli encodes (or a second, closely related, nitrate reductase complex 
which also contains a molybdopterin-binding alpha chain (gene narZ). - Escherichia coli anaerobic dimethyl sulfoxide 
reductase (DMSO reductase). DMSO reductase is the terminal reductase during anaerobic growth on various sulfoxide 
and N-oxide compounds. DMSO reductase is composed of three chains: A, B and C. The A chain (gene dmsA) binds 

to molybdopterin. - Escherichia coli biotin sulfoxide reductases (genes bisC and bisZ). This enzyme reduces a sponta- 
neous oxidation product of biotin, BDS, back to biotin. It may servo as a scavenger, allowing tho cell to use biotin 
sulfoxide as a biotin source, - Methanobacterium lormicicum formate dehydrogenase (EC 1.2.1.2 ). The alpha chain 
(gone fdhA) of this dimeric enzyme binds a molybdopterin cofactor. - Escherichia coli formate dehydrogenases -H 
(gene fdhF), -N (gene fdnG) and -O (gene fdoG). These enzymes are responsible for the oxidation of lormate to carbon 

is dioxide. In addition to molybdopterin, the alpha (catalytic) subunit also contains an active site, selenocysteine. - Wb- 
lirtolln oucctnof jonoo polyoullido roductnoo chain, This onzymo is a componont of tho phosphorylativo oloctron transport 
system with polysullido as the terminal accoptor. It Is composod of throo chains: A, B and C. Tho A chain (gone psrA) 
binds molybdopterin. - Salmonella typhimurium thiosulfate reductase (gene phsA). - Escherichia coli trimethylamine- 
N-oxido roductase (EC 1.6.6.9 ) (gone torA) [4]. - Nitrate reductase (EC 1.7.99.4 ) from Klebsiella pneumoniae (gene 

20 nasA), Alcaligenes eutrophus, Escherichia coli, Rhodobacter sphaeroides, Thiosphaera pantotropha (gene napA), and 
Synechococcus PCC 7942 (gene narB).These proteins range from 71 5 amino acids (fdhF) to 1 246 amino acids (narZ) 
insizo. Throo signature pattorns tor those onzymos were derived. The first is based on a conserved region in the N- 
terminal section and contains two cysteine residues perhaps involved in binding the molybdopterin cofactor. It should 
be noted that this region is not present in bisC. The second pattern is derived from a conserved region located in the 

25 central part of these enzymes. 

[1038] Consensus pattern: [STAN]-x4CH]-x(2 1 3)-CMSTAG]-[GSTVMF]-x-C-x-[LIVMFYW]-x.[LIVMA]-x(3,4)-[DEN. 

QKHT]- 

Consonsus pattern: [STA]-x-|STAC](2)-x(2)-[STA]-D-[LIVMY](2)-L-P-x-[STAC](2)-x(2)-E- 
Consensus pattern: A-x(3)-[GDT)-l-x4DNQTK]-x-[DEA 

30 

1 1) Wootton J.C.. Nicolson R.E., Cock J.M., Walters D.E., Burke J.F., Doyle W.A., Bray R.C. Biochim. Biophys. 
Acta 1057:157-185(1991). 

[ 2] Bilous P.T., Cole S.T., Anderson W.F., Weiner J.H. Mol. Microbiol. 2:785-795(1988). 
[ 3] Trieber C.A., Rothery R.A., Weiner J.H. J. Biol. Chem. 269:7103-7109(1994). 
35 [ 4] Mejean V., Lobbi-Nivol C. t Lepelletier M., Giordano G., Chippaux M., Pascal M.-C. Mol. Microbiol. 11:1169-1179 

(1994). 

[1039] 360. Bacterial mutT domain signature 

The bacterial mutT protein is involved in the GO system [T] responsible lor removing an oxidatively damaged form of 
40 guanine (8-hydroxyguanine or7,8-dihydro-8-oxoguanine) from DNA and the nucleotide pool. 8-oxo-dGTP is inserted 
opposite to dA and dC residues of template DNA with almost equal efficiency thus leading to A T to G.C transversions. 
MutT specifically degrades 8-oxo-dGTP to the monophosphate with the concomitant release of pyrophosphate. MutT 
is a small protein of about 12 to 15 Kd. It has been shown [2,3] that a region of about 40 amino acid residues, which 
is found in the N-terminal part of mutT, can also be found in a variety of other prokaryotic, viral, and eukaryotic proteins. 
45 Those proteins are: 

Streptomyces pneumoniae mutX. 

A mutT homolog from plasmid pSAM2 of Streptomyces ambofaciens. 
Bartonella bacilliformis invasion protein A (gene invA). 
so - Escherichia coli dATP pyrophosphohydrolase. 

Protein D250 from African swine fever viruses. 
Proteins D9 and D10 Irom a variety of poxviruses. 

- ^ Mammalian 7,8-dihydro-8-oxoguanine triphosphatase (EC 3.1.6.-) [4]. 

- ' Mammalian diadenosine 5\5"*-P1 ,P4-tetraphosphate asymmetrical hydrolase (Ap4Aase) (EC 3.6.1.17) [5], which 
ss cleaves A-5'-PPPP-5'A to yield AMP and ATP. 

A protein encoded on the antisense RNA of the basic fibroblast growth factor gene in higher vertebrat s. 
Yoast protoin YSA1. 

Escherichia coli hypothetical protein yfaO. 
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Escherichia coli hypothetical protein ygdU and HI0901 , the corresponding Haemophilus influenzae protein. 
Escherichia coli hypothetical protein yjaD and HI0432, the corresponding Haemophilus influenzae protein. 
Escherichia coli hypothetical protein yrfE. 

Bacillus subtilis hypothetical protein yqkG. * 
s . Bacillus subtilis hypothetical protein yzgD. 

Yeast hypothetical protein YGL067w. 1 

[1 040] It is proposed [2] that the conserved domain could be involvedin the active center of a family of pyrophosphate- 
releasing NTPases. As a signature pattern the core region of the domain was selected; it contains four conserved 
to glutamate residues. 

[1041] Consensus pattern: G-x(5)-E-x(4)-[STAGC]-[LIVMAC]-x-R-E-[LIVMFT]-x-E-E- 

(1] Michaels M.L., Miller J.H. J. Bacterid 174:6321-6325(1992). 
[2] Koonin E.V. Nucleic Acids Res. 21:4847-4847(1993). 
is [3] Mejean V, Salles C., Bullions M.J., Bessman M.J., Claverys J.-P. Mol. Microbiol. 11:323-330(1994). 

[4] Sakumi K., Furuichi M., Tsuzuki T., Kakuma T., Kawabata S., Maki H. ( Sekiguchi M. J. Biol. Chenrv. 268: 
23524-23530(1993). 

[5] Thome N.M.H., Hankin S., Wilkinson M.C., Nunez C, Barraclough R., McLennan A.G. Bibchem. J. 311 :717-721 
(1995). 

20 

[1042] 361. Myb DNA-binding domain repeat signatures 

The retroviral oncogene v-myb , and its cellular counterpart c-myb, encodenuclear DNA-binding proteins that specifi- 
cally recognize the sequence YAAC(G/T)G [1]. The myb family also includes the following proteins: - Drosophila D- 
myb [2]. - Vertebrate myb-like proteins A-myb and B-myb [3). - Maize C1 protein, a trans-acting factor which controls 

25 the expression ol genes involved in anthocyanin biosynthesis. - Maize P protein [4], a trans-acting factor which regulates 
the biosynthetic pathway of a flavonoid-derived pigment in certain floral tissues. - Arabidopsis thaliana protein GL1 [5], 
required for the initiation of differentiation of leaf hair cells (trichomes). - A number of myb/cl-related proteins in maize 
and barley, whose roles are not yet known [4]. - Yeast BAS1 [7], a transcriptional activator for the HIS4 gene. - Yeast 
REB1 [8], which recognizes sites within both the enhancer and the promoter of rRNA transcription, as well as upstream 

30 of many genes transcribed by RNA polymerase II. - Fission yeast cdc5, a possible transcription factor whose activity 
is required tor cell cycle progression and growth during G2. - Fission yeast mybl, which regulates telomere longth and 
function. - Yeast hypothetical protein YMR21 3w.One of the most conserved regions in all of these proteins is a domain 
of 160amino acids. It consists of three tandem repeats of 51 to 53 amino acids. In myb, this repeat region has been 
shown [9] to be involved in DNA-binding. The major part of the first repeat is missing in retroviral v-myb sequences 

35 and in plant myb-related proteins. Yeast REB1 differs Irom the other proteins in this family in having a single myb-like 
domain. As shown in the following schematic representation, two signature patterns for myb-like domains were devel- 
op d; the first is located in the N-terminal section, the second spans the C-torminal oxtromity of tho domain. 

40 xxxxxxxxxWxxxEDxxxxxxxxxxxxxx WxxIxxxxxxRxxxxxxxxWxxxx ♦♦*♦***** 

. p 0s i t i 0n of the patterns. 

45 [1043] Consensus pattern: W-[ST]-x(2)-E-[DE]-x(2)-[LIV]- 

Consensus pattern: W-x(2)-[LI]-[SAG]-x(4,5)-R-x(8)-[YW]-x(3)-[LIVM]- 

Note: this pattern detects the three copies of the domain in myb, d-myb, A-myb and B-myb; the second of the two 
complete copies of plant myb-related proteins, and the last two copies of yeast BAS1 

so [ 1] Biednkapp H., Borgmeyer U., Sippel A.E., Klempnauer K.-H. Nature 335:835-837(1988). 

[ 2] Peters C.W.B., Sippel A.E., Vingron M. ( Klempnauer K.-H. EMBO J. 6:3085-3090(1987). 

[3] Nomura N., Takahashi M., Matsui M.. Ishii S., Date T, Sasamoto S., Ishizaki R. Nucleic Acids Res. 16: 

11075-11090(1988). 

[ 4] Grotewold E. t Athma P., Peterson T. Proc. Natl, Acad. Sci. U.S.A. 88:4587-4591(1991). 
55 [ 5] Oppenhetmer D.G., Herman PL., Sivakumaran S. ( Esch J., Marks M.D. Cell 67:483-493(1991) . 

[ 6) Marocco A., Wissenbach M., Becker D., Paz -Ares J., Saedler H., Salamini F., Rohde W. Mol. G n. Genet. 216: 
183-187(1989). 

[ 7] Tice-Baldwin K., Fink G.R., Amdt K.T, Science 246:931-935(1989). 
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[ 8] Ju Q., Morrow B.E., Warner J.R. Mol. Cell. Biol. 10:5226-5234(1990). 
I 9] Klempnauer K.-H., Sippel A.E. EMBO J. 6:2719-2725(1987). 

I 

[1044] 362, NAD-dependent glycerol-3-phosphate dehydrogenase signature 
f. NAD-dopondont glycorol-3-phocphnlo dohydrofconaeo (EC 1.1.1.8 ) (GPD) catalyzos the .reversible reduction of dihy- 
dtoxyacetone phosphate to glycerol-3- phosphate. It is a eukaryotic cytosolic homodimeric protein of about 40 Kd. As 
a cignnturo paltorn a glycino-rich region that is probably [1] involved in NAD-binding was selected. 
[1045] Consensus pattern: G-[ATMLIVM]-K-[DN]-[LIVM](2)-A-x-[GA]-x-G-[LIVMF]-x- [DE]-G-[LIVM]-x-[LIVMFYW]- 
G-x-N- 

io [1046] | 1] Otto J., Argos P., Rossmann M.G. Eur. J. Biochem. 109:325-330(1980). 
[1047] 363. Nucleosome assembly protein (NAP) 

[1048] It is thought that NAPs may be involved in regulating gene expression as a result of histone accessibility [1]. 

|1) Rodiiguoz P, Munroo D, Prawilt D, Chu LL, Brie E, Kim J, Roid LH, Davios C, Nakagama H, Loebbert R, 
is Winterpacht A, Petruzzi MJ, Higgins MJ, Nowak N, Evans G, Shows T, Weissman BE, Zabel B, Housman DE. 

Pollolior J, Gonomics 1997;44:253-265. 

[2] Schnieders F, Dork T, Arnemann J, Vogel T, Werner M, Schmidtke J; Hum Mol Genet 1996;5:1801-1807.' 

[1049] 364. NB-ARC domain 
20 van der Biezen E A,- Jones JD, Curr Biol 1998;8:226-227. 

[1050] 365. Nucleoside diphosphate kinases active site 

[1051] Nucleoside diphosphate kinases (EC 2.7.4.6 ) (NDK) [1] are enzymes required for the synthesis of nucleoside 
triphosphates (NTP) other than ATP. They provide NTPs for nucleic acid synthesis, CTP lor lipid synthesis, UTP for 
polysaccharide synthesis and GTP for protein elongation, signal transduction and microtubule polymerization. In eu- 

25 karyotes, there seems to be a small family of NDK isozymes each of which acts in a different subcellular compartment 
and/or has a distinct biological function. Eukaryotic NDK isozymes are hexamers of two highly related chains (Aand 
B) [2]. By random association (A6, A5B...AB5, B6), these two kinds of chain form isoenzymes differing in their isoelectric 
point. NDK aro pioloins of 17 Kd that act via a ping-pong mechanism in which a histidine residue is phosphorylated, 
by transfer of tho terminal phosphate group from ATP In Iho prosonco of magnesium, tho phosphoonzyme can transfer 

30 its phosphate group to any NDP, to produce an NTP NDK isozymes have been sequenced from prokaryotic and eu- 
karyotic sources. It has also been shown [3] that the Drosophila awd (abnormal wing discs) protein, is a microtubule- 
associated NDK. Mammalian NDK is also known as metastasis inhibition factor nm23.The sequence of NDK has been 
highly conserved through evolution. There is a single histidine residue conserved in all known NDK isozymes, which 
ic involvod in tho cntalyllc mechanism |2]. Our signaturo pattorn contains this rosiduo. 

35 [1052] Consensus pattern: N-x(2)-H-[GA]-S-D-[SA]-[LIVMPKNE] [H is the putative active site residue]- 

[ 1] Parks R., Agarwal R. (In) The Enzymes (3rd edition) 8:307-334(1973). 

[ 2] Gilles A.-M., Presecan E., Vonica A., Lascu I. J. Biol. Chem. 266:8784-8789(1991). 

[ 3] Biggs J., Horsporger E. ( Stoeg PS., Liotta L.A., Shearn A. Cell 63:933-940(1990). 

40 

[1053] 366. Nitrite and sulfite reductases iron-sulfur/siroheme-binding site (NIR_SIR) Nitrite reductases (NiR) [1] 
catalyze the reduction of nitrite into ammonium, the second step in the assimilation of nitrate. There are two types of 
NiR: the higher plant chloroplastic form of NiR (EC 1.7.7.1 ) is a monomeric protein that uses reduced ferredoxin as 
the electron donor; while fungal and bacterial NiR (EC 1.6.6.4 ) are homodimeric proteins that uses NAD(P)H as the 

45 oloctron donor. Both forms of NiR contain a sirohome-Fe and iron-sulfur centers. Sulfite reductase (NADPH) (EC 
1.8.1.2 ) (SIR) [2] is the bacterial enzyme that catalyzes the reduction of sulfite to sulfide. SIR is an oligomeric enzyme 
with a subunit composition of alpha(8)-beta(4), the alpha component is a flavoprotein (SIR-FP), while the beta com- 
ponent is a siroheme, iron-sullurprotein (SIR-HP). Sulfite reductase (ferredoxin) (EC 1.8.7.1 ) [3] is a cyanobacterial 
and plant monomeric enzyme that also catalyzes the reduction of sulfite to sulfide. Anaerobic sulfite reductase (EC 

50 1.8.1.-) (ASR) [4], a bacterial enzyme that catalyzes the NADH-dependent reduction of sulfite to sulfide. ASR is an 
oligomeric enzyme composed of three different subunits. The C component (geneasrC) seems to be a siroheme, iron- 
sulfur protein. These enzymes share a region of sequence similarity in their C-terminal half; this region which spans 
about 80 amino acids includes four conserved cysteine residues. Two of the Cys are grouped together at the beginning 
of the domain, and the two others are grouped in the middle of the domain. The cysteines are involved in the binding 

55 of the iron-sulfur center; the last one also binds the siroheme group [2]. A signature pattern from the region around the 
second cluster of cysteines was derived. 

[1054] Consensus pattern: [STV]-G-C-x(3)-C-x(6)-[DE)-(LIVMFl-|GAT]-[LIVMF] [The IwoC's are ison-sulfur ligands]- 
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[ 1] Campbell W.H., Kinghorn J.R. Trends Biochem. Sci. 15:315-319(1990). 

[ 2] Crane B.R.. Sieqel L.M., Getzoff E.D. Science 270:59-67(1995). ' 
[ 3] Gisselmann G., Klausmeier P., Schwenn J.D. Biochim. Biophys. Acta 1144:102-106(1993). 
[4] Huang C.J. . Barrett E.L. J. Bacteriol. 173:1544-1553(1991). 

5 * 1 , 

[1055] 367. (NMT) My ristoyl-Co A: protein N-myristoyltransf erase signatures. Myristoyl-CoA: 

protein N-myristoyltransf erase (EC 2.3.1.97 ) (Nmt) [1] is the enzyme responsible for transferring a myristate group on 
the N-terminal glycine of a number of cellular eukaryotic and viral proteins. Nmt is a monorneric protein of about 50 to 
60 Kd whose sequence appears to be well conserved. Two highly conserved regions have been developed as signature 
io patterns. The first one is located in the central section, the second in the C-terminal part. 
[1056] Consensus pattern: E-l -N-F-L-C-x-H-K- 
Consensus pattern: K-F-G-x-G-D-G- 

[10571 [ 1] Rudnick D.A., McWherter C.A., Gokel G.W., Gordon J.I. Adv. Enzymol. 67:375-430(1993). 
[1058] 368. ADP-glucose pyrophosphorylase signatures (NTPJransferase) 

is [1059] ADP-glucose pyrophosphorylase (glucose-1 -phosphate adenylyltransferase) I1.2KEC 2.7.7.27) catalyzes a 
very important step in the biosynthesis of alpha 1,4-glucans (glycogen or starch) in bacteria and plants: synthesis of 
th activated glucosyl donor, ADP-glucose, from glucose-1 -phosphate and ATP. ADP-glucose pyrophosphorylase is a 
tetrameric allosterically regulated enzyme. It is a homotetramer in bacteria while in plant chloroplasts and amyloplasts, 
it is a heterotetramer of two different, yet evolutionary related, subunits. There are a number of conserved regions in 

20 the sequence of bacterial and plant ADP-glucose pyrophosphorylase subunits. Three of these regions were selected 
as signature patterns. The first two are N-terminal and have been proposed to be part of the allosteric and/or substrate- 
binding sites in the Escherichia coli enzyme (gene glgC). The third pattern corresponds to a conserved region In the 
central part of the enzymes. 

[1060] Consensus pattern: [AG]-G-G-x-G-[STK]-x-L-x(2)-L-[TA]-x(3)-A-x-P-A-[LV]- 
25 Consensus pattern: W-[FY]-x-G-[ST]-A-[DNSH]-[AS]-[LIVMFYW]- 

Consensus pattern: [APVJ-[GS]-M-G-[LIVMN]-Y-[IVC]-[LIVMFY]-x(2)-[DENPHK]- 

[ 1]NakataP.A., Greene T.W., Anderson J. M., Smith-White B.J., Okila T.W., Preiss J. Plant Mol. Biol. 17:1089-1093 
(1991). 

30 [ 2] Preiss J., Ball K., Hutney J., Smith-White B.J., Li. L, Okitsa T.W. Pure Appl. Chem. 63:535-544(1991). 

[1061] 369. Sodium/hydrogen exchanger family 

[1062] Na/H antiporters are key transporters in maintaining the pH of actively metabolizing colls. The molecular 
mechanisms of antiport are unclear. 
35 These antiporters contain 10-12 transmembrane regions (M) at the amino-terminus and a large cytoplasmic region at 
the carboxyl terminus. The transmembrane regions M3-M12 share identity with other members of the family. The M6 
and M7 regions are highly conserved. Thus, this is thought to be the region that is involved in the transport of sodium 
and hydrogen ions. The cytoplasmic region has litllo similarity throughout tho family. 

[1063] [1] Dibrov P, Fliegel L; FEBS Lett 1998;424:1-5. |2J Orlowskl J, Grinstein S; J Biol Chom 1997;272: 
40 22373-22376.[3] Numata M, Petrecca K, Lake N, Orlowski J; J Biol Chem 1998;273:6951-6959. 
[1064] 370. Sodium:sulfale symporter family signature (Na_sulph_symp) 

Integral membrane proteins that mediate the intake of a wide variety of molecules with the concomitant uptake of 
sodium ions (sodium symporters) canbe grouped, on the basis of sequence and functional similarities into a number 
of distinct families. One of these families currently consists of the following proteins: - Mammalian sodium/sulfate 

45 cotransporter [1]. - Mammalian renal sodium/dicarboxylate cotransporter [2], which transports succinate and citrate. - 
Mammalian intestinal sodium/dicarboxylate cotransporter. - Chlamydomonas reinhardtii putative sulfur deprivation re- 
sponse regulator SAC1 [3]. - Caenorhabditis elogans hypothetical protoins B0285.6, F31F6.6, K08E5.2and R107.1. 
- Escherichia coli hypothetical protein yfbS. - Haemophilus influenzae hypothetical protein HI0608. - Synochocystis 
strain PCC 6803 hypothetical protein SII0640. - Methanococcus jannaschii hypothetical protein MJ0672.These trans- 

50 porters are proteins of from 430 to 620 amino acids which are highly hydrophobic and which probably contain about 
12 transmembrane regions. As a signature pattern, a conserved region was selected which is located in or near the 
penultimate transmembrane region. 

[1065] Consensus pattern: [STACP]-S-x(2)-F-x(2)-P-[LIVM]-[GSA]-x(3)-N-x-[LIVM]-V- 

55 [ 1] Markovich D., Forgo J., Stange G., Biber J., Murer H. Proc. Natl. Acad. Sci. U.S.A. 90:8073-8077(1993). 

[ 2] Pa]or A.M. Am. J. Physiol. 270:642-648(1996), , 
[ 3] Davies J.R, Yildiz F.hi, Grossman A. EMBO J. 15:2150-2159(1996). 
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[1066] 371. NifU-like domain . 

[1 067] This is an alignment of the carboxy-terminal domain. This is the only common region between the Nif U protein 
from nitrqgen-fixing bacteria and rhodobacterial species. The biochemical function of NifU is unknown [1]. 
[1068] Ouzounis C, Bork P, Sander C, Trends Biochem Sci 1994;19:199-200. 

s [1069] 372. Nitrilasos /cyanide hydratase signatures 

Nilrilases (EC 3.5.5.1 ) are enzymes that convert nitriles into their corresponding acids and ammonia. They are wide- 
spread in microbes as welt as in plants where they convert indole-3-acetonitrile to the hormone indole-3-acetic acid. 
A conserved cysteine has been shown (1 ,2] to be essential for enzyme activity; it seems to be involved in a nucleophihc 
attack on the nitrile carbon atom. Cyanide hydratase (EC 4.2.1.66 ) converts HCN to formamide. In phytopathogen.c 

w lunui it is uGod to avoid tho toxic otloct of cyanide roloasod by wounded plants [3]. Tho sequence of cyanide hydrolase 
io evolutionary rolnlod to thru ol nlWilaooG. Yoasl hypothetical piotoins YIL164C and YIL165c also bolong to this family. 
As signature patterns for these enzymes, two conserved regions were selected. The first is located in the N-termmal 
section while the second, which contains the active site cysteine, is located in the central section. 
[1070] Consensus pattern: G-x(2)-[LIVMFY](2)-x-[IF]-x-E-x(2)-ILIVM]-x-G-Y-P- 

is Consensus pattern: G-[GAO]-x(2)-C-[WA]-E.[NH]-x{2)-[PSTMLIVMFYS]-x.[KR] [C is the active site residue]- 

I 1) Kobayaohi M. t Izul H., Nngaoawa T, Yamada H. Proc. Natl. Acad, Sci. U.S.A. 90:247-251(1993). 
[ 2] Kobayashi M. t Komeda H., Yanaka N., Nagasawa T., Yamada H. J. Biol. Chem. 267:20746-20751(1992). 
[ 3] Wang P. ( Vanetten H.D. Biochem. Biophys. Res. Commun. 187:1048-1054(1992). 


20 


[1071] 373. NusB family . 
[1072] Tho NusB protein Is involved In tho rogulation of rRNA biosynthesis by transcriptional antitermtnation. 
[1073] Huenges M. Rolz C, Gschwind R, Peteranderl R, Berglechner F, Richter G, Bacher A, Kessler H.Gemmecker 
G, EMBO J 1998;17:4092-4100. 

25 [1074] 374 (Neur Chan) Neurotransmitter-gated ion-channels signature 

Neurotransmitter-gated ion-channels [1,2,3.4] provide the molecular basis lor rapid signal transmission at chemical 
synapses They are post-synapticotigomeric transmembrane complexes that transiently form a ionic channel upon the 
binding of a specific neurotransmitter. Presently, the sequence of subunits from five types of neurotransm.tter-gated 
receptors are known: - The nicotinic acetylcholine receptor (AchR), an excitatory cation channel. In the motor endplates 

30 of vertebrates, it is composed of four different subunits (alpha, beta, gamma and delta or epsilon) with a molar stoichi- 
ometry of 2 VV1 In neurones, the AchR receptor is composed of two different types of subunits: alpha and non-alpha 
(also called beta) Nicotinic AchRs are also found in invertebrates. - The glycine receptor, an inhibitory chloride ion 
channel The glycine receptor is a pentamer composed of two different subunits (alpha and beta). - The gamma- 
aminobutyric-acid (GABA) receptor, which is also an inhibitory chloride ion channel. The quaternary structure of the 

35 GABA receptor is complex; at least four classes of subunits are known to exist (alpha, beta, gamma, and delta) and 
there are many variants in each class (for example: six variants of the alpha class have already been sequenced). - 
The serotonin 5HT3 receptor. Serotonin is a biogenic hormone that functions as a neurotransmitter, a hormone and a 
mitogen There are seven major groups of serotonin receptors; six of these groups (5HT1, 5HT2, and 5HT4 to 5HT7) 
transduce extracellular signal by activating G proteins, while 5HT3 is a ligand-gated cation-specific ion channel which, 

40 when activated causes last, depolarizing responses in neurons. - The glutamate receptor, an excitatory cation channel. 
Glutamate is the main excitatory neurotransmitter in the brain. At least three different types of glutamate receptors 
have been described and are named according to their selective agonists (kainate, N-methyl-D-aspartate (NMDA) and 
quisqualate) All known sequences of subunits Irom neurotransmitter-gated ion-channels are structurally related. They 
are composed of a large extracellular glycosylated N-terminal ligand-binding domain, followed by three hydrophobic 

45 transmembrane regions which form the ionic channel, followed by an intracellular region of variable length. A fourth 
hydrophobic region is found at the C-terminal of the sequence. The sequence of subunits from the AchR, GABA, 5HT3, 
and Gly receptors are clearly evolutionary related and share many regions of sequence similarities. These sequence 
similarities are either absent or very weak in the Glu receptors. In the N-terminal extracellular domain of AchR/GABA/ 
5HT3/Gly receptors, there are two conserved cysteine residues, which, in AchR, have been shown to form a disulfide 

50 bond essential to the tertiary structure of the receptor. A number of amino acids between the two disulfide-bonded 
cysteines are also conserved. Therefore this region was used as a signature pattern for this subclass of proteins. 
[1075] Consensus pattern: C-x-[LIVMFQ]-x-[LIVMF]-x(2)-[FY]-P-x-D-x(3)-C [The two C's are linked by a disulfide 
bond]- 

55 [1] Stroud R.M., McCarthy MP., Shuster M. Biochemistry 29:11009-11023(1990). 

[ 2] Betz H. Neuron 5:383-392(1990). 

[ 3] Dingledlne R., My rs S.J.., Nicholas R.A. FASEB J. 4:2632-2645(1990). 
[ 4] Barnard E.A. Trends Biochem. Sci. 17:368-374(1992). 
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droplets (0.2 to 1 .5 mu-m in diameter) containing mostly triacylglycerol that are surrounded by a phospholipid/oleosin 
annulus. Oleosins may have a structural role in stabilizing the lipid body during dessication of the seed, by preventing 
coalescence of the oil. They may also provide recognition signals for specific lipase anchorage in lipolysis during 
noodlinn growth. Oloocins nro found in tho monolayer lipid/ wator intoriace of oil bodies and probably interact with both 

o iho lipid and phocpholipid moiotios. 

OJeosins are proteins of 16 Kd to 24 Kd and are composed of three domains: an N-terminal hydrophilic region of 
variable length (from 30 to 60 residues); a central hydrophobic domain of about 70 residues and a C-terminal amphip- 
athic region of variable length (from 60 to 100 residues). The central hydrophobic domain is proposed to be made up 
of beta-strand structure and to interact with the lipids [2]. It is the only domain whose sequence is conserved and 

io ihoroloro ;i section Irom that domain was soloctod as a signature pattern. 

[1082] Consensus pattern: |AG]-[ST]-x(2)-[AG]-x(2)-ILIVM]-ISAD]-T-P-ILIVMF](4)-F-S-P-[LIVM](3)-P-A 

[ 1] Murphy D.J., Keen J.N., O'Sullivan J.N., Au D.M.Y., Edwards E.-W., Jackson P.J., Cummins I., Gibbons T., 
ShawC.H., Ryan A.J. Biochim. Biophys. Acta 1088:86-94(1991). 
is [ 2] Tzen J.T.C., Lie G.C., Huang A.H.C. J. Biol. Chem. 267:15626-15634(1992). 

11083] 379. (Orb I VP5) Orblvlrus oulor cupsld protoln VPS 

[1084] This paper shows the location of the different capsid proteins and their relation to each other. 
[1086] |1] Schoohn G, Moss SR, Nuttall PA, Howat EA; Virology 1997;235:191-200. 
20 [1086] 380. Orn/DAP/Arg decarboxylases family 2 signatures 

Pyridoxal-dependent decarboxylases acting on ornithine, lysine, arginine and related substrates can be classified into 
two dilloront families on tho basis of soquenco similarities [1 ,2,3]. The second family consists of: 

- Eukaryotic ornithine decarboxylase (EC 4.1.1.17) (ODC). ODC catalyzes the transformation of ornithine into pu- 
25 trescine 

- Prokaryotic diaminopimelic acid decarboxylase (EC 4.1.1.20) (DAPDC). DAPDC catalyzes the conversion of di- 
aminopimelic acid into lysine; the last step in the biosynthesis of lysine. 

. Peoutlomonns syrinflFio pv. tnbaci protein tabA. tabA is probably involved in the biosynthesis of tabtoxin and is 
highly similar to DAPDC. 

30 - Bacterial and plant biosynthetic arginine decarboxylase (EC 4.1.1.19) (ADC). ADC catalyzes the transformation 
of arginine into agmatine, the first step in the biosynthesis of putrescine from arginine. 

The above proteins, while most probably evolutionary related, do not share extensive regions of sequence similarities. 
Two of tho consorvod regions woro soloctod as signature patlorns. Tho first pattern contains a conserved lysine residue 
35 which is known, in mouse ODC (4], to be the site of attachment of the pyridoxal-phosphate group. The second pattern 
contains a stretch of three consecutive glycine residues and has been proposed to be part of a substrate-binding region 

[5]. 

These enzymes are collectively known as group IV decarboxylases (3]. 

[1087] Consensus pattern: (FY]-[PA]-x-K-[SACV]-[NHCLFW]-x(4)-[LIVMF]-[LIVMTA]-x(2)-[LIVMA]-x(3)-[GTE] [K is 

40 the pyridoxal-P attachment site] 

Consensus pattern: [GS]-x(2,6)-[LIVMSCP)-x(2)-[UVMF]- |DNS]-[LIVMCA]-G-G-G-[LIVMFY]-[GSTPCEQ] 

[ 1] Bairoch A. Unpublished observations (1993). 

[ 2] Martin C, Cami B., Yeh P., Stragier P., Parsot C, Patte J.-C. Mol. Biol. Evol. 5:549-559(1988). 
45 [ 3] Sandmeier E., HaleT.I., Christen R Eur. J. Biochem. 221:997-1002(1994). 

[ 4] Poulin R., Lu L, Ackermann B., Bey P., Pegg A.E. J. Biol. Chem. 267:150-158(1992). 
( 5] Moore R.C., Boyle S.M, J. Bacteriol. 172:4631-4640(1990). 

[1088] 381. Osteopontin signature 
so Osteopontin is an acidic phosphorylated glycoprotein of about 40 Kd which is abundant in the mineral matrix of bones 

and which binds tightly to hydroxy apatite (1,2,3]. It is suggested that osteopontin might lunction as a cell attachment 

factor and could play a key role in the adhesion of osteoclasts to the mineral matrix of bone. 

Gsteopontin-K is a kidney protein which is highly similar to osteopontin and probably also involved in cell-adhesion. 

As a signature pattern a highly conserved region located at the N-terminal extremity of the mature protein was selected. 
55 [1089] Consensus pattern: [KQ]-x-[TA]-x(2)-[GA]-S-S-E-E-K 

[ 1] Butler W.T. Connect. Tissu R s. 23:123-36(1989). 
[ 2] Gorski J.R Calcif. Tissue Int. 50:391-396(1992). 
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[ 3] Denhardt D.T., Guo X. FASEB J. 7:1475-1482(1993). 

[1090] 382. Oxysterol-binding protein family signature ' 
A number of eukaryotic proteins that seem to be involved with sterol synthesis and/or its regulation have been found 
5 [1] to be evolutionary relat d: • 

Mammalian oxysterol-binding protein (OSBP). A protein of about 800 amino-acid residues that binds a variety of 
oxysterols: oxygenated derivatives of cholesterol. OSBP seems to play a complex role in the regulation of sterol 
metabolism. 

io - Yeast proteins HES1 and KES1; highly related proteins of 434 residues that seem to play a role in ergosterol 
synthesis. 

Yeast OSH1, a protein of 859 residues that also plays a role in ergosterol synthesis. - Yeast hypothetical protein 
YHR001w(437 residues). 

Yeast hypothetical protein YHR073w (996 residues). 
is - Yeast hypothetical protein YKR003w (448 residues). 

[1091] All these proteins contain a moderately conserved domain of about 250 residues located in the C-terminal 
half of OBSP, OSH1 and YHR073w and in the central section of the other proteins. As a signature pattern, the best 
conserved part was selected of this domain, a region that contains a conserved pentapeptide. 
20 [1092] Consensus pattern: E-[KQ]-x-S-H-[HR]-P-P-x-[STACF]-A 

[1093] [ 1] Jiang B., Brown J.L., Sheraton J., Fortin N., Bussey H. Yeast 10:341-353(1994). 

[1094] 383. FMN oxidoreductase 

[1095] 384. Oxidoreductase FAD/NAD-binding domain 

Number of members: 250 

25 [1] 

Medline: 92084635 

Th sequence of squash NADH:nitrate reductase and its relationship to the soquoncos of other flavoprotoin oxidoro- 
ductases. A family of flavoprotein pyridine nucleotide cytochrome reductases. 
Hyde GE, Crawtord NM, Campbell W; 
30 J Biol Chem 1 991 ;266:23542-23547. 

[2]Medline: 95111952 

Crystal structure of the FAD-containing fragment of corn nitrate reductase at 2.5 A resolution: relationship to other 
flavoprotein reductases. 
Lu G, Campbell WH, Schneider G, Lindqvist Y; 
35 Structure 1994;2:809-821 . 

[1096] 385. (oxidored molyb) Eukaryotic molybdopterin oxidoreductases signature A number of different eukaryotic 
oxidoreductases that require and bind a molybdopterin cofactor have been shown [ 1 ] to share a few regions of sequence 
similarity. These enzymes are: 

40 - Xanthine dehydrogenase (EC 1.1.1 .204), which catalyzes the oxidation of xanthine to uric acid with the concomitant 
reduction of NAD. Structurally, this enzyme of about 1300 amino acids consists of at least three distinct domains: 
an N-terminal 2Fe-2S f erredoxin-like iron-sulfur binding domain (see <PDOC001 75>), a central FAD/NAD-binding 
domain and a C-terminal Mo-pterin domain. 

Aldehyde oxidase (EC 1.2.3.1), which catalyzos tho oxidation aldehydes into acids. Aldohydo oxidaso io highly 
45 similar to xanthine dehydrogenase in its sequence and domain structure. 

Nitrate reductase (EC 1.6.6.1), which catalyzes the reduction of nitrate to nitrite. Structurally, this enzyme of about 
900 amino acids consists of an N-terminal Mo-pterin domain, a central cytochrome bS-type heme-binding domain 
(see <PDOC00170>) and a C-terminal FAD/NAD-binding cytochrome reductase domain. 

Sulfite oxidase (EC 1.8.3.1), which catalyzes the oxidation of sulfite to sulfate. Structurally, this enzyme of about 
so 460 amino acids consists of an N-terminal cytochrome b5-binding domain followed by a Mo-ptorin domain. 

There are a few conserved regions in the sequence of the molybdopterin-binding domain of these enzymes. The pattern 
used to detect these proteins is based on one of them. It contains a cysteine residue which could be involved in binding 
the molybdopterin cofactor. 

ss [1097] Consensus pattern: [GA]-x(3)-[KRNQHT]-x(11 ,14)-[LIVMFYWS]-x(8)-[LIVMF]-x-C-x(2)-fDEN]-R-x(2)-[DE] 
[1098] [1] Wootton J.C., Nlcolaen R.E., Cock J.M., Walton* DM,, Duiko J.-ft, Doylo W,A„ Uiay I'iC, t!Jiuehi»n r ttiophyt*, 
Acta 1057:157-185(1991). 

[1099] 386. (Oxidored ql) NADH-Ubiquinone/plastoquinone (complex I), various chains 
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This family is part ol complex I which catalyses the transfer of two electrons from NADH to ubiquinone in a reaction 
that is associated with proton translocation across the membrane. Number of members: 1824 

[1] I 
MocJIino: 03110040 

6 Tho NADH:ublquinono oxidoroductaso {complox I) of rospiralory chains. Walkor JE; 
. Q Rev Biophys 1992;25:253-324. 
[1100] 387. (oxidored q3) NADH-ubiquinone/plastoquinone oxidoreductase chain 6. 179 members. 
[1101] 388. (oxidored q5) NADH-ubiquinone oxidoreductase chain 4, amino terminus 
[11021 [11 Walker JE ; Q Rev Biophys 1992;25:253-324. 

w [1103] 389 (oxidorod q6) Ro6piratory-chain NADH dehydrogenase 20 Kd subunit signature Respiratory-chain NADH 
dehydrogenase (EC 1 6.5.3) [1,2] (also known as complox I or NADH-ubiquinono oxidoreduclase) is an oligomeric 
enzymatic complex located in the inner mitochondrial membrane which also seems to exist in the chloroplast and in 
cyanobacteria (as a NADH-plastoquinone oxidoreductase). Among the 25 to 30 polypeptide subunits of this bioener- 
qetic enzyme complex there is one with a molecular weight of 20 Kd (in mammals) [3]. which is a component of the 

is Tron-sulfur (IP) fragment of the enzyme. It seems to bind a 4Fe-4S iron-sulfur cluster. The 20 Kd subunrt has been 
lound to bo: ( 

- Nuclear encoded, as a precursor form with a transit peptide in mammals, and in Neurospora crassa. - Mitochondrial 
encoded in Paramecium (gone psbG). 
20 - Chloroplast encoded in various higher plants (gene ndhK or psbG). 

The 20 Kd subunit is highly similar to [4]; 

Synechocystis strain PCC 6803 proteins psbG1 and psbG2. 
26 - Subunit B ol Escherichia coll NADH-ubiquinono oxidoreductase (gone nuoB). 

Subunit NQ06 of Paracoccus denitrificans NADH-ubiquinone oxidoreductase. 
Subunit 7 of Escherichia coli formate hydrogeniyase (gene hycG). 
Subunit I of Escherichia coli hydrogenase-4 (gene hytl). 

30 As as signature pattern a highly conserved region was selected, located in the central section of this subunit and which 
contains a conserved cysteine that is probably involved in the binding of the 4Fe-4S center 

[1104] Consensus pattern: t GN]-x-D-[EAST]-[LIVMF](2)-P-[IV].D-[LIVMFYW](2)-x-P-x-C-P-[PT] [The C « a putative 
4Fe-4S ligand] 

35 [ 1] Ragan C.I. Curr. Top. Bioenerg. 15:1-36(1987). 

[ 2] Weiss H Friedrich T., Hofhaus G., Preis D. Eur. J. Biochem. 197:563-576(1991). 

[ 3] Arizmendi J M , Runswick M.J.. Skehel J.M.. Walker J.E. FEBS Lett. 301 :237-242(1992). 

[ 4] Weidner U., Geier S.. Ptock A., Friedrich T., Leif H., Weiss H. J. MoL Biol. 233:109-122(1993). 

ao [1105] 390. p53 tumor antigen signature 

The p53 tumor antigen [1 to 5, E1.E2] is a protein found in increased amounts in a wide vanety of transformed cells. 
It is also detectable in many proliferating nontransformed cells, but it is undetectable or present at low levels in rest.ng 
cells It is frequently mutated or inactivated in many types of cancer. p53 seems to act as a tumor suppressor in some 
but probably not all, tumor types. p53 is probably involved in cell cycle regulation, and may be a trans-activator that 

as acts lo nogalivoly rogulato collular division by controlling a set of genes required for this process. 

p53 is a phosphoprotein of about 390 amino acids which can be subdivided into tour domains: a highly charged acidic 
region of about 75 to 80 residues, a hydrophobic proline-rich domain (position 80 to 150), a central region (from 150 
to about 300), and a highly basic C-terminal region. The sequence of p53 is well conserved in vertebrate spec.es; 
attempts to identify p53 in other eukaryotic philum has so far been unsuccessful. 

so as a signature patlern for P 53 a perfectly conserved stretch of 1 3 residues located in the central region of the protein 
was selected This rogion. known as domain IV in |3], is involved (along with an adjacent region) in the binding ot the 
large T antigen ot SV40. In man this region is the focus ot a variety ot point mutations in cancerous tumors. 
[1106] Consensus pattern: M-C-N-S-S-C-M-G-G-M-N-R-R 

55 [ 1] Levine A.J., Momand J., Finlay C.A. Nature 351:453-456(1991). 

[ 2] Levine A.J., Momand J. Biochim. Biophys. Acta 1032:119-136(1990). 
[ 3] Soussi T., Caron De Fromentel C, May P. Oncogene 5:945-952(1990). 
[ 4] Lane D.P., Benchimol S. Genes Dev. 4:1-8(1990). 
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[ 5] Ulrich S.J., Anderson C.W., Mercer W.E., Appella E. J. Biol. Cherri. 267:15259-15262(1992). 
[1107] 391. (P5CR) Delta 1 -pyrroline-5-carboxylate reductase signature 

Delta T-pyrroline-5-carboxylate reductase (P5CR) (EC 1.5.1.2) [1,2) is the enzyme that catalyzes the terminal stop in 
5 the biosynthesis of prolino Irom glutamato, tho NAD(P) dopondont oxidation of 1 -pyrrolino-5-carboxylato Into proline 
The sequences of P5CR from eubacteria (gene proC), archaebacteria and eukaryotes show only a moderate level of 
overall similarity As a signature pattern, the best conserved region located in the C-terminal section of P5CR was 
selected. 

[1108] Consensus pattern: [PALF]-x(2,3)-[LIV)-x(3)-[LIVMJ-[STAC]-(STV]-x-[GAN]-G-x-T-x(2)-|AG]-[LIV]-x(2)- 
10 [LMF]-[DENQK] 

[ 1] Delauney A.J., Verma DP. Mol. Gen. Genet. 221:299-305(1990). 

[ 2] Savioz A. f Jeenes D.J., Kocher H P, Haas D. Gene 86:107-111(1990). 

is [1109] 392. Poly-adenylate binding protein, unique domain. 

[1110] 393. (PAL) Phenylalanine and histidine ammonia-lyases active site 

Phenylalanine ammonia-lyase (EC 4.3.1.5) (PAL) is a key enzyme of plant and fungi phenylpropanoid metabolism 
which is involved in the biosynthesis of a wide variety of secondary metabolites such as flavanoids, furanocoumarin 
phytoalexins and cell wall components. These compounds have many important roles in plants during normal growth 
20 and in responses to environmental stress. PAL catalyzes the removal of an ammonia group from phenylalanine to form 
trans-cinnamate. 

Histidine ammonia-lyase (EC 4.3.1.3) (histidase) catalyzes the first step in histidine degradation, the removal of an 
ammonia group from histidine to produce urocanic acid. 

The two types of enzymes are functionally and structurally related [1]. They are the only enzymes which are known to 
25 have the modified amino acid dehydroalanine (DHA) in their active site. A serine residue has been shown [2,3,4] to be 
the precursor 61 this essential electrophilic moiety. The region around this active site residue is well conserved and 
can be used as a signature pattern. 

[1111] Consensus pattern: G-[STG]-[LIVM]-[STG]-[AC]-S-G-[DH]-L-x-P-L-[SA]-x(2)-[SA] [S is the active site residue] 

30 [1] Taylor R.G., Lambert M.A., Sexsmith E., Sadler S.J., Ray P.N., Mahuran D.J., Mclnnes R.R. J. Biol. Chem. 

265:18192-18199(1990). 

[ 2] Langer M., Reck G., Reed J., Retey J. Biochemistry 33:6462-6467(1 994). 

[ 3] Schuster B., Retey J. FEBS Lett. 349:252-254(1994). 

[ 4] Taylor R.G., Mclnnes R.R. J. Biol. Chem. 269:27473-27477(1994). 

35 

[1112] 394. PAS domain 

-!- CAUTION. This family does not currently match all known examples of PAS domains. 
PAS motifs appear in archaea, eubacteria and eukarya. Probably 
the most surprising identification of a PAS domain was that in 
40 EAG-like K+-channels[1,3]. 
Number of members: 308 
[1] 

Medline: 97446881 

PAS domain S-boxos in archaea, bactoria and eonsors for oxygon and rodox, 
45 Zhulin IB, Taylor BL, Dixon R; 

Trends Biochem Sci 1997;22:331-333. 
[2]Medline: 95275818 

1.4 A structure of photoactive yellow protein, a cytosolic photoreceptor: unusual fold, active site, and chromophore. 
Borgstahl GE, Williams DR, Getzoff ED; 
so Biochemistry 1995;34:6278-6287. 

[3]Medline: 98044337 
PAS. a multifunctional domain family comes to light. 
Ponting CP, Aravind L; 
Curr Biol 1997;7:674-677. 
55 [1113] 395. (PBP) Phosphatidylethanolamine-binding protein family signatur 

Mammalian phqsphaUdylolhunolmilno-blndlng pioloin (tilao known a buttle oylcwollu 21 KU piuiuin) (t> h IU0 roblUuu 
protein found in a variety of tissues [ 1 ]. It binds hydrophobic ligands, such as phosphatldylethanolamine, but also seems* 
[2] to bind nucleotides such as GTP and FMN, it Is suggested that It could act In membrane remodeling during growth 
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and maturation. This protein belongs to a family that also includes: 

Drospphila antennal protein A5, a putative odorant-binding protein: , 
Onchocerca volvulus antigen Ov-1 6 and the related proteins Dt, D2 and D3. 
5 - Plasmodium falciparum putative phosphatidylethanolamine-binding protein. ' 1 , 

Toxocara canis secreted antigen TES-26. This larval protein has been shown to bind phosphatidylethanolamine. 
- Yoast protoin DKA1 (also known as NSP1 or TFS1), The function of this protein is not very clear. -Yeast hypothetical 
protein YLR179C. . 
Caenorhabditis elegans hypothetical protein F40A3.3. 

w 

As a signaluro pattern, tho bost conservod region was selected which is located in the end of the first third of the 
sequence of these proteins. 

[1114] Consensus patlern; [FYL)-x-[LV]-[LIVF]-x-(TIV]-[DC]-P-D-x-P-[SN]-x(10)-H 

is [1] Seddiqi N., Bollengier F. t Alliel P.M., Perin J.P, Bonnet F., Bucquoy S., Jolles P., Schoentgen F. J. Mol. Evol. 

30:GGG-GGO(1WM), 

I 2] Schoentgen F., Jolles P. FEBS Lett. 369:22-6(1995). ' 

[1115] 396. PCI domain 
20 This domain has also been called the PINT motif (Proteasome, 
Inl-G, Nip-1 andTRIP-15)|t]. 
Numlxu of mombom: 40 
[1] 

Medline: 98308842 
25 The PCI domain: a common theme in three multiprotein complexes. 
Hofmann K, Bucher P; 

Trends Biochem Sci 1998;23:204-205. 

[2]Modlino: 98266368 

Homologues of 26S proteasome subunits are regulators of transcription and translation. 

30 Aravind L, Ponting CP; 

Protein Sci 1998;7:1250-1254. 
[1116] 397. (PCMT) Protein-L-isoaspartate (D-aspartate) O-methyltransferase signature. Protein-L-isoaspartate (D- 
aspartate) O-methyltransferase (EC 2. 1.1. 77 ) (PCMT)[1] (which is also known as L-isoaspartyl protein carboxyl meth- 
yltmnsf oraso)is an onzymo that catalyzos tho transfer of a methyl group from S-adenosylmethionine to the free carboxyl 

35 groups of D-aspartyl or L-isoaspartyl residues in a variety of peptides and proteins. The enzyme does not act on normal 
L-aspartyl residues L-isoaspartyl and D-aspartyl are the products of the spontaneous de amidation and/or isomerization 
of normal L-aspartyl and L-asparaginyl residues in proteins. PCMT plays a role in the repair and/ordegradation of these 
damaged proteins; the enzymatic methyl esterification ot the abnormal residues can lead to their conversion to normal 
L-aspartylresidues. PCMT is a well-conserved and widely distributed cytosolic protein of about 24Kd. As a signature 

40 pattern, a conserved region in the central part of Ihis enzyme has been developed. 
[1117] Consensus pattern: [GSA]-D-G-x(2)-G-[FYWV]-x(3)-[AS]-P-[FY]-[DN]-x-l - 

[1118] [ 1] Kagan R.M., McFadden H.J., McFadden P.N., O'Connor C, Clarke S. Comp. Biochem. Physiol. 117b: 
379-385(1997). 

[1119] 398. (PCNA) Proliferating cell nuclear antigen signatures 
45 Proliferating cell nuclear antigen (PCNA) [1 ,2] is a protein involved in DNA replication by acting as a cofactor for DNA 
polymerase delta, the polymerase responsible for leading strand DNA replication. 

A similar protein exists in yeast (gene POL30) [3] and is associated with polymerase III, the yeast analog of polymerase 
delta. In baculoviruses the ETL protein has been shown [4] to be highly related to PCNA and is probably associated 
with the viral encoded DNA polymerase. An homolog of PCNA is also found in archebacteria. 
so As signatures for this family of proteins, two conserved regions were selected located in the N-terminal section. The 
second ono has been proposed to bind DNA. 

[1120] Consensus pattern: [GA]-[LIVMF]-x-[LIVMA)-x-[SAVj-[LIVM]-D-x-[NSAE]-[HKRHVI]-x-[LY]-[VGA]-x-[LIVM]- 
x-[LIVM] : x(4)-F 

55 - Consensus pattern: [RKA]-C-[DE]-[RH]-x(3)-[LIVMF]-x(3)-[LIVM]-x-[SGAN]-[LIVMF]-x-K-[LIVMF](2) 
[ 1] Bravo R. ( Frank R., Blundell PA., McDonald-Bravo H. Nature 326:515-517(1987). 

[ 2] Suzuka I., Hata S., Matsuoka M., Kosugi S. t Hashimoto J. Eur. J. Biochem. 195:571 -575(1 991 ).[ 3] Bauer G. 
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A., Burgess P.M.J. Nucleic Acids Res. 18:261-265(1990). 

[ 4] O'Reilly D.R., Crawford A.M., Miller L.K. Nature 337:606-606(1989). 

[1121] 399. (PDT) Prephenate dehydratase signaturos 

Prephenate dehydratase (EC 4.2.1.51) (PDT) catalyzes the decarboxylation of prephenate into phenylpyruvate. In. 
microorganisms PDT is involved in the terminal pathway of the biosynthesis of phenylalanine. In some bacteria such 
as Escherichia coli PDT is pari of a bif unctional enzyme (P-protein) that also catalyzes the transformation of chorismate 
into prephenate (chorismate mutase) while in other bacteria it is a monof unctional enzyme. The sequence of mono- 
functional PDT align well with the C-terminal part of that of P-proteins [1]. 

As signature patterns tor PDT two conserved regions were selected. The first region contains a consorvod throonino 
which has been said to be essential for the activity of the enzyme in E. coli. The second region includes a conserved 
glutamate. Both regions are in the C-terminal part of PDT 

[1122] Consensus pattern: [FY]-x-[LIVM]-x(2)-[LIVM]-x(5)-[DN]-x(5)-T-R-F-[LIVMW]-x-[LIVM] 
[1123] [ 1] Fischer R.S., Zhao G., Jensen R.A. J. Gen. Microbiol. 137:1293-1301(1991). 
[1124] 400. PDZ domain (Also known as DHR or GLGF). 
[1125] PDZ domains are found in diverse signaling proteins. 
[1126] [1] Ponting CP. Phillips C, Davies KE, Blake DJ 

Bioessays 1997;19:469-479. [2] Doyle DA, Lee A, Lewis J, Kim E, Sheng M, MacKinnon R; Cell. 1996;85:1067-1076. 

[3] Ponting CP; Protein Sci 1997;6:464-468. 

[1127] 401. (PPDK_NJerm) PEP-utilizing enzymes signatures 

A number of enzymes that catalyze the transfer of a phosphoryl group from phosphoenolpyruvate (PEP) via a phospho- 
histidine intermediate have been shown to be structurally related [1,2,3,4]. These enzymes are: 

Pyruvate.orthophosphate dikinase (EC 2.7.9.1) (PPDK). PPDK catalyzes the reversible phosphorylation of pyru- 
vate and phosphate by ATP to PEP and diphosphate. In plants PPDK function in the direction of the formation of 
PEP, which is the primary acceptor of carbon dioxide in C4 and crassulacean acid metabolism plants. In some 
bacteria, such as Bacteroides symbiosus, PPDK functions in the direction of ATP synthesis. 
Phosphoenolpyruvate synthase (EC 2.7.9.2) (pyruvate.water dikinase). This enzyme catalyzes the reversible 
phosphorylation of pyruvate by ATP to form PEP, AMP and phosphate, an essential step in gluconeogenesis when 
pyruvate and lactate are used as a carbon source. 

Phosphoenolpyruvate-protein phosphotransferase (EC 2.7.3.9). This is the first enzyme of the phosphoenolpyru- 
vate-dependent sugar phosphotransferase system (PTS), a major carbohydrate transport system in bacteria. The 
PTS catalyzes the phosphorylation of incoming sugar substrates concomitant with Iholr translocation ncroee Iho 
cell membrane. The general mechanism of the PTS Is the following: a phosphoryl group from PEP is transferred 
to enzyme-l (El) of PTS which in turn transfers it to a phosphoryl carrier protein (HPr). Phospho-HPr then transfers 
the phosphoryl group to a sugar-specific permease. 

All these enzymes share the same catalytic mechanism: they bind PEP and transfer the phosphoryl group from it to a 
histidine residue. The sequence around that residue Is highly conservod and can bo used as a slgnaturo pattorn for 
these enzymes. As a second signature pattern a conserved region was selected in the C-terminal part of the PEP- 
utilizing enzymes. The biological significance of this region is not yet known. 

[1128] Consensus pattern: G-[GA]-x-[TN]-x-H-[STA]-[STAV]-[LIVM](2)-[STAV]-[RG] [H is phosphorylated] 

- Consensus pattern: lDEQSK]-x-[LIVMF]-S-[LIVMF]-G-[ST]-N-D-|LIVM]-x-Q-[LIVMFYGT]-(STALIV]-[LIVMF]- 
[GAS]-x(2)-R 

[ 1) Reizer J., Hoischen C, Reizer A., PhamT.N., Saier M.H. Jr. Protein Sci. 2:506-521(1993). 

[2] Reizer J., Reizer A., Merrick M.J., Plunkett G. Ill, Rose D.J., Saier M.H. Jr. Gene 181:103-108(1996). 

[3] Pocalyko D.J., Carroll L.J., Martin B.M., Babbitt PC, Dunaway-Mariano D. Biochemistry 29:10757-10765 

(1990). 

[ 4] Niersbach M., Kreuzaler R, Geerse R.H., Postma P, Hirsch H.J. Mol. Gen. Genet. 232:332-336(1992). 
[1129] 402. (PEPCK ATP) Phosphoenolpyruvate carboxykinase (ATP) signature 

Phosphoenolpyruvate carboxykinase (ATP) (EC 4. 1 . 1 .49) (PE PCK) [ 1 ] catalyzes the formation of phosphoenolpyruvate 
by decarboxylation of oxaloacetate while hydrolyzing ATP, a rate limiting step in gluconeogenesis (the biosynthesis of 
glucose). i 
The sequence of this enzyme has been obtained from Escherichia coli, yeast, and Trypanosoma brucei; these three 
sequences are evolutionary related and share many regions of similarity. As a slgnaturo paltorn a highly consorved 
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beginning of the pattern » ^f j ( ^ p . bin ding domain |2]. 

<P DOC0 P 017» and .s also part of ^ A ^. x . w . x 9 (D El-x-G-l« V]-x-N „ , 

l1130] con G onou S pnUorn.L..- G 32) an enzyme that cata.yzes the same reaction, b.t 

. . Noto.phospho-^ 

172715 , 7I56(1990) . 

1 1] Medina V.. PontarCo R, ^^fiZZSt* ^6:126-143(1996, 

1 2 { MHllo A.. Goldio a. Sweo, R.M.. Dofta p hosphoe n 0 lpyruvate carboxylase (EC 

[1131] 40 3. (Pepcase) ' 
^31) (PEPcase) catalyze s the -rreve^ . ^ ^ ^ a vj^^^ lhe se active site 

t1 9 132l Consensus pattern. W*™^ } ^MSl* IK is an act-ve s.te ^ ue Cnol|el R ., o'Leary 

Consensus pattern: l^-^ 

[1133] 1 11 Terada K., Izui K. bur ^ l041;29n . 2 95(1990). 

. yeast mitochondrial prolem PET112 (1], wn.c P 
kohiv, at the level ot translation. 

. Mycoplasma 9 .n«*^^ 
36 Kd subunits. In mammals it 18 a \ eUa J™ Kd subuni i s . m Human there are three, ussu w Kd a|pha 

SKSSSa 

r:sr^":^ * 

[1138] Consensus pattern. [HN M«> ^ 
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u K.n.vcerate mutase lamily phosphohistidine s"9™^ ^ structura |ly 

[n3 9, 406.(PGA M )PhosphO9^ 

Phosphoglycerate ^mutase jnvolving tne transfer of phospho groups 

related enzymes which caiaiyze 
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of phosphoglycerate [1,2]. Both enzymes can catalyze three different reactions, although in different proportions: 

- The isomerization of 2-phosphoglycerate (2-PGA) to 3-phosphoglycerate (3-PGA) with 2,3-diphosphoglycerate 
(2,3-DPG) as the primer of the reaction. 

5 - The synthesis of 2,3-DPG from 1,3-DPG with 3-PGA as a primer. 

- The degradation of 2,3-DPG to 3-PGA (phosphatase EC 3.1.3.1 3 activity). 

In mammals, PGAM is a dimeric protein. There are two isoforms of PGAM: the M (muscle) and B (brain) forms. In 
yeast, PGAM is a tetrameric protein. BPGM is a dimeric protein and is found mainly in erythrocytes where it plays a 
10 major role in regulating hemoglobin oxygen affinity as a consequence of controlling 2,3-DPG concentration. 

The catalytic mechanism of both PGAM and BPGM involves the formation of a phosphohistidine intermediate [3]. 
The bifunctional enzyme 6-phosphofructo-2-kinase / fructose-2,6-bisphosphatase (EC 2.7.1.105 and EC 3,1.3,46) 
(PF2K) [4] catalyzes both the synthesis and the degradation of fructose-2,6-bisphosphate. PF2K is an important en- 
zyme in the regulation of hepatic carbohydrate metabolism. Like PGAM/BPGM, the fructose-2,6-bisphosphatase re- 
ts action involves a phosphohistidine intermediate and the phosphatase domain of PF2K is structurally related to PGAM/ 
BPGM. 

The bacterial enzyme alpha-ribazole-5'-phosphate phosphatase (gene cobC) which is involved in cobalamin biosyn- 
thesis also belongs to this family [5]. 

A signature pattern was built around the phosphohistidine residue. 
20 [1140] Consensus pattern: [LIVM]-x-R-H-G-[EQ]-x(3)-N [H is the phosphohistidine residue] 

Note: some organisms harbor a form of PGAM independent of 2,3-DPG, this enzyme is not related to the family 
described above [6). 

25 [ 1] Le Boulch P., Joulin V., Garel M.-C, Rosa J., Cohen-Solal M. Biochem. Biophys. Res. Commun. 156:874-881 

(1988). 

[ 2] While M.F., Fothergill-Gilmore L.A. FEBS Lett. 229:383-387(1988). 
[ 3] Rose Z.B. Meth. Enzymol. 87:43-51(1982). 

[ 4] Bazan J.F., Fletterick R.J., Pilkis S.J. Proc. Natl. Acad. Sci. U.S.A. 86:9642-9646(1989). 
30 [ 5] OToole G.A., Trzebiatowski J.R., Escalante-Semerena J.C. J. Biol. Chem. 269:26503-26511(1994). 

[ 6] Grana X., De Lecea L, El-Maghrabi M.R., Urena J.M., Caellas C, Carreras J., Puigdomenech P., Pilkis S.J., 
Climent F. J. Biol. Chem. 267:12797-12803(1992). 

[1141] 407. (PGI) Phosphoglucose isomerase signatures 

35 Phosphoglucose isomerase (EC 5.3. 1 ;9) (PGI) [1 ,2] is a dimeric enzyme that catalyzes the reversible isomerization of 
glucose-6-phosphate and 1ructose-6-phosphate. PGI is involved in different pathways: in most higher organisms it is 
involved in glycolysis; in mammals it is involved in gluconeogenesis; in plants in carbohydrate biosynthesis; in some 
bacteria it provides a gateway for fructose into the Enlner-Doudourotf pathway. PGI has boen shown [3] to be idontical 
to neuroleukin, a neurotrophic factor which supports the survival of various types of neurons. 

to The sequence of PGI from many species ranging from bacteria to mammals is available and has been shown to be 
highly conserved. As signature patterns for this enzyme two conserved regions were selected, the first region is located 
in the central section of PGI, while the second one is located in its C-terminal section. 
[1142] Consensus pattern: (DENS]-x-[LIVM]-G-G-R-[FY]-S-[LIVMT]-x-[STA]-[PSAC]-[LIVMA]-G 

45 - Consensus pattern: [GS]-x-[LIVM]-[LIVMFYW]-x(4)-[FY]-[DN]-Q-x-G-V-E-x(2)-K 

[ 1] Achari A., Marshall S.E., Muirhewad H„ Palmieri R.H., Noltmann E.A, Philos. Trans. R. Soc. Lond., B, Biol. 
Sci. 293:145-157(1981). 

[ 2] Smith M.W., Doolittle R.F. J. Mol. Evol. 34:544-545(1992). 
so [ 3] Faik P, Walker J.I.H., Redmill A. A.M., Morgan M.J. Nature 332:455-456(1988). 

[1143] 408. (PGK) Phosphoglycerate kinase signature 

Phosphoglycerate kinase (EC 2.7.2.3) (PGK) [1] catalyzes the second step in the second phase of glycolysis, the 
reversible conversion of 1 ,3-diphosphoglycerate to 3-phosphoglycerate with generation of one molecule of ATP. PGK 
ss is found in all living organisms and. its sequence has been highly conserved throughout evolution. It is a two-domain 
protein; each domain lb eempoaed of alx repeale of an alpl la/bota eliuuluifcl motif, Aa ti blyimtuit* pattern lui haM'is, n 
conserved region in the N-terminal region was selected. 
Consensus pattern: [KRHGTCVNHVT]-[LIVMF]-[LIVMC]-R-x-D-x-N-[SACV]«P 
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[1144] [ 1] Watson H.C., Littlechild J. A. Biochem. Soc. Trans. 18:187-190(1990). 

[1145] 409. (PGM PMM) Phosphoglucomutase and phosphomannomulase phosphoserine signature 

- Phosphoglucomutase (EC 5.4.2.2) (PGM). PGM is an enzyme responsible for the conversion of D-glucose 1 -phos- 
/. phnio Into D-tjIuoono G-phonphHlo. PGM purllclpntoe in both tho bronkdqwn and synthesis of glucose [1]. 

. Phosphomannomutase (EC 5.4.2.8) (PMM). PMM is an enzyme responsible for the conversion of D-mannose 
' 1 -phosphate into D-mannose 6-phosphate. PMM is required for different biosynthetic pathways in bacteria. For 
example in ontorobactoria such as Escherichiacoli there are two different genes coding for this enzyme: rfbK. 
which is involved in the synthesis of the O antigen of tipopolysaccharide and cpsG which is required for the synthesis 
io of tho M antigen capsular polysaccharide [2]. In Pseudomonas aeruginosa PMM (gene algC) is involved in the 

biocynthoGis ol tho ulginalo layer 1 3] and in Xanthomonas campostris (gono xanA) it is involved in the biosynthesis 
of xanthan 14]. In Rhizobium strain ngr234 (gene noeK) it is involved in the biosynthesis ol the nod factor. 

- Phosphoacetylglucosamine mutase (EC 5.4.2.3) which converts N-acetyl-D-glucosamine 1 -phosphate into the 
6-phosphate isomer. 


75 


Tho catalytic mechanism of both PGM and PMM involves the formation of a phosphoserine intermediate [1]. The 
ooquonco wound tho ooilno roolduo is well conoorvod nnd can bo uood as a signature pattern. 
In addition to PGM and PMM there are at least three uncharacterized proteins that belong to this family [5,6]: 

20 - Urease operon protein ureC from Helicobacter pylori. 
Escherichia coli protein mrsA. 

Paramocium totraurolia parafusin, a phosphoglycoprotein involved in exocytosis. 

A Molhanococcuo vanniolii hypothetical protein in tho 3'rogion of tho gono tor ribosomal protein S10. 

25 [1146] Consensus pattern: [GSA]-[LIVM)-x-[LIVM]-[ST]-[PGA]-S-H-x-P-x(4)-[GNHE] [S is the phosphoserine resi- 
due] 

Note: PMM from fungi do not belong to this family. 

30 [1] Dai J.B., Liu Y.. Ray W.J. Jr., Konno M. J. Biol. Chem. 267:6322-6337(1992). 

[ 2] Stevenson G. r Lee S.J., Romana L.K., Reeves RR. Mol. Gen. Genet. 227:173-180(1991). 

[ 3] Zielinski N A., Chakrabarty A.M., Berry A. J. BioL Chem. 266:9754-9763(1991). 

[ 4] Koeplin R., Arnold W., Hoette B., Simon R., Wang G., Puehler A. J. Bacteriol. 174:191-199(1992). 

I 5] Bairoch A. Unpublished observations (1993). ' ~ * 

35 [ 6] Subramanian S.V., Wyroba E.. Andersen A.R, Satir B.H. Proc. Natl. Acad. Sci. U.S.A. 91:9832-9836(1994). 

[1147] 410. PH domain profile 

The 'pleckstrin homology' (PH) domain is a domain of about 100 residues that occurs in a wide range of proteins 
involved in intracellular signaling or as constituents of the cytoskeleton [1 to 7]. 
40 The function of this domain is not clear, several putative functions have been suggested: - binding to the beta/gamma 
subunit of heterotrimeric G proteins, 

binding to lipids, e.g. phosphatidylinositol-4,5-bisphosphate, 
binding to phosphorylated Ser/Thr residues, 
45 - attachment to membranes by an unknown mechanism. 

It is possible that different PH domains have totally different ligand requirements. 

The 3D structure of several PH domains has been determined [8]. All known cases have a common structure consisting 
of two perpendicular anti-parallel beta sheets, followed by a C-terminal amphipathic helix. The loops connecting the 
50 beta-strands differ greatly in length, making the PH domain relatively difficult to detect. There are no totally invariant 
rosiduos within the PH domain. 

Proteins reported to contain one more PH domains belong to tho following families: 

- ^ Pleckstrin, the protein where this domain was first detected, is the major substrate of protein kinase C in platelets. 
55 Pleckstrin is one of the rare proteins to contains two PH domains. 

- Ser/Thr protein kinases such as the Act/Rac family, the beta-adrenergic receptor kinases, the mu isoform of PKC 
and tho trypanosomal NrkA family. 

Tyrosine protein kinases belonging to the Btk/ltk/T c subfamily. 
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Insulin Receptor Substrate 1 (IRS-1). 

Regulators of small G-proteins like guanine nucleotide releasing factor GNRP (Ras-GRF) (which contains 2 PH 
domains), guanine nucleotide exchange proteins like vav, dbl, SoS and yeast CDC24, GTPase activating proteins 
like rasGAP and BEM2/IPL2, and tho human broak point cluster protoin bcr. 

5 - Cytoskeletal proteins such as dynamin (see <PDOC00362>), Caenorhabditis elegans kinosin-like protoin unc-104 

(see <PDOC00343>), spectrin beta-chain, syntrophin ; (2 PH domains) and yeast nuclear migration protein NUM1 . - 
Mammalian phosphatidylinositol-specific phospholipase C (PI-PLC) (see <PDOC50007>) isoforms gamma and 
delta. Isoform gamma contains two PH domains, the second one is split into two parts separated by about 400 
residues. - Oxysterol binding proteins OSBP, yeast OSH1 and YHR073w. 

io - Mouse protein citron, a putative rho/rac etlector that binds to the GTP-bound forms of rho and rac, 

Several yeast proteins involved in cell cycle regulation and bud formation like BEM2, BEM3, BUD4 and the 
BEM1 -binding proteins BOI2 (BEB1) and BOI1 (BOB1). - Caenorhabditis elegans protein MIG-10. 
Caenorhabditis elegans hypothetical proteins C04D8. 1 , K06H7.4 arid ZK632. 1 2. 
Yeast hypothetical proteins YBR1 29c and YHR1 55w. 

15 

The profile for the PH domain, which has been developed by Toby Gibson at the EMBL, covers the total length of 
domain. Several proteins contain large insertions in the PH domain and are thus difficult to detect with this profile. In 
some of these cases, the profile will align only to one half of the PH domain. 

20 - Sequences known to belong to this class detected by the pattern: ALL. But it should be noted that while all se- 
quences containing PH domains are detected, not all PH domains are. Some of the split domains lie below the 
cutoff threshold. 

[ 1] Mayer B.J., Ren R., Clark K.L, Baltimore D. Cell 73:629-630(1993). 

25 [ 2] Haslam R.J., Koide H.B., Hemmings B.A. Nature 363:309-310(1993). 

[ 3] Musacchio A., Gibson T.J., Rice R, Thompson J. t Saraste M. Trends Biochem. Sci. 18:343-348(1993). 
[4] Gibson T. J., Hyvonen M. ( Musacchio A., Saraste M,, Birney E. Trends Biochem. Sci. 19:349-353(1 994). [ 5] 
Pawson T. Nature 373:573-580(1 995).[ 6] Ingley E., Hemmings B.A. J. Cell. Biochem. 56:436-443(1 994).[ 7] Sar- 
aste M., Hyvonen M. Curr. Opin. Struct. Biol. 5:403-408(1 995). [ 8] Riddihough G. Nat. Struct. Biol. 1:755-757 

30 (1994). 

411. PHD-finger 
[1] 

Medline: 95216093 

35 The PHD finger: implications for chromatin-mediated transcriptional regulation. 
Aasland R, Gibson TJ, Stewart AF; 

Trends Biochem Sci 1995;20:56-59. 
Number of members: 181 

[1148] 412. (PI-PLC-X) Phosphatidylinositol-specific phospholipase C profiles Phosphatidylinositol-specific phos- 
40 pholipaseC (EC 3.1.4.11), an eukaryotic intracellular enzyme, plays an important role in signal transduction processes 
[1]. It catalyzes the hydrolysis of 1-phosphatidyl-D-myo-inositol-3,4,5-triphosphate into the second messenger mole- 
cules diacylglycerol and inositol-1 ,4,5-triphosphate. This catalytic process is tightly regulated by reversible phosphor- 
ylation and binding ol regulatory proteins [2 to 4]. 

In mammals, there are at least 6 different isoforms of PI-PLC, they differ in their domain structure, their rogulation, and 

^5 their tissue distribution. Lower eukaryotes also possess multiple isoforms of PI-PLC. 

All eukaryotic PI-PLCs contain two regions of homology, sometimes referred to as 'X-box' and 'Y-box'. The order of 
th se two regions is always the same (NH2-X-Y-COOH), but the spacing is variable. In most isoforms, the distance 
between these two regions is only 50-100 residues but in the gamma isoforms one PH domain, two SH2 domains, and 
on SH3 domain are inserted between the two PLC-specific domains. The two conserved regions have been shown 

so to be important for the catalytic activity. At the C-terminal of the Y-box, there is aC2 domain (see <PDOC00380>) 
possibly involved in Ca-dependent membrane attachment. 

Profile analysis shows that sequences with significant similarity to the X-box domain occur also in prokaryotic and 
trypanosome Pl-specific phospholipases C. Apart from this region, the prokaryotic enzymes show no similarity to their 
eukaryotic counterparts. 
55 Two profiles were developed, one covering the X-box, the other the Y-box. 

I 

[ 1) Meldrum E., Parker RJ,, Carozzi A. Biochim. Biophys. Acta 1092:49-71(1991),[ 2] Rhee S.G., Choi K,D. Adv, 
Second Messenger Phosphoprotein Res. 26:35-61 (1 992). 
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[ 3] Rhee S.G., Choi K.D. J. Biol. Chem. 267:12393-12396(1992). 

[ 4] Sternweis P.C., Smrcka A.V. Trends Biochem. Sci. 17:502-506(1992). 

i ' . 

[1149] 413. (PI-PLC-Y) Phosphatidytinositol-spGcilic phospholipase C profiles 
& Phosphalidyllnositol-spocilic phospholipase C (EC 3.1.4.11), an eukaryolic intracellular enzyme, plays an important 
role in signal transduction processes [1 ]. It catalyzes the hydrolysis of 1 -phosphatidyl-D-myo-inositol-3,4,5-triphosphate 
into tho socond mossongor moloculos diacylglycorol and inositol-1,4,5-triphosphate. This catalytic process is tightly 
regulated by reversible phosphorylation and binding of regulatory proteins [2 to 4J. 

In mammals, there are at least 6 different isotorms of PI-PLC, they differ in their domain structure, their regulation, and 

w their tissue distribution. Lower eukaryotes also possess multiple isotorms of PI-PLC. 

All eukaryolic PI-PLCs contain two regions of homology, sometimes referred to as 'X-box' and 'Y-boxV The order of 
these two regions is always the same (NH2-X-Y-COOH), but the spacing is variable. In most isotorms, the distance 
bolwoon those two regions is only 50-100 residues but in the gamma isoforms one PH domain, two SH2 domains, and 
one SH3 domain are inserted between the two PLC-specific domains. The two conserved regions have been shown 

15 to be important for the catalytic activity. At the C-terminal of the Y-box t there is a C2 domain (see <PDOC00380>) 
ponoibly involved in Ca-dopondont mombrane attachment. 

Prolilo analysis shows that soquoncoo with signilicanl similarity to tho X-box domain occur also in prokaryotic and 
trypanosome Pl-specific phospholtpases C. Apart from this region, the prokaryotic enzymes show no similarity to their 
oukaryotic counterparts. 
20 Two profiles were developed, one covering the X-box, the other the Y-box. 

[ 1] Moldrum E., Parker P.J., Carozzi A, Biochim. Biophys. Acta 1092:49-71(1991). [ 2] Rhee S.G., Choi K.D. Adv. 
Second Messenger Phosphoprotein Res. 26:35-61(1992). 
[ 3] Rhee S.G., Choi K.D. J. Biol. Chem. 267:12393-12396(1992). 
25 [ 4] Sternweis PC, Smrcka A.V. Trends Biochem. Sci. 17:502-506(1992). 

[1150] 414. (PK) Pyruvate kinase active site signature 

Pyruvate kirviso (EC 2 7.1.40) (PK) [1] catalyzos tho linal stop in glycolysis, the conversion of phosphoenolpyruvate 
to pyruvate with the concomitant phosphorylation of ADP to ATP PK requires both magnesium and potassium ions tor 
30 its activity. PK is found in all living organisms. In vertebrates there are four, tissues specific, isozymes: L (liver), R (red 
cells), M1 (muscle, heart, and brain), and M2 (early fetal tissues). In Escherichia coli there are two isozymes: PK-I 
(gene pykF) and PK-II (gene pykA). All PK isozymes seem to be tetramers of identical subunits of about 500 amino 
acid residues. 

Ag h signature pnllorn for PK a consorvod region was soloctod that includos a lysino residue which seems to be the 
35 acid/base catalyst responsible for the interconversion ol pyruvate and enolpyruvate, and a glutamic acid residue im- 
plicated in the binding of the magnesium ion. 

[1151] Consensus pattern: [LIVAC]-x-|LIVM](2)-[SAPCV)-K-|LIV]-E-[NKRST]-x-[DEQHS]-|GSTA]-[LIVM] [K is the 

active site residue] [E is a magnesium ligand] 

[1152] [ 1) Muirhead H. Biochem. Soc. Trans. 18:193-196(1990). 
40 [1153] 415. (PLDc) Phospholipase D. Active site motif 

Phosphatidylcholine-hydrolyzing phospholipase D (PLD) isotorms are activated by ADP-ribosylation factors (ARFs). 

PLD produces phosphatidic acid from phosphatidylcholine, which may be essential for the formation of certain types 

of transport vesicles or may be constitutive vesicular transport to signal transduction pathways. 

PC-hydrolyzing PLD is a homologue of cardiolipin synthase, phosphatidylserine synthase, bacterial PLDs, and viral 
45 protoins. 

Each of these appears to possess a domain duplication which is apparent by the presence of two motifs containing 
well-conserved histidine, lysine, and/or asparagine residues which may contribute to the active site, aspartic acid. An 
E. coli endonuclease (nuc) and similar proteins appear to be PLD homologues but possess only one of these motifs. 
The profile contained here represents only the putative active site regions, since an accurate multiple alignment of the 
so repeat units has not been achieved. 
Number of members: 1 39 
P] 

IvTedline:. 9630381 4 

A novel family of phospholipase D homologues that includes phospholipid synthases and putative endonucleases: 
55 identification of duplicated repeats and potential active site residues. 
Ponting CP, Kerr ID; 

Protoln Sci 1996;5:914-922. 

[2]Medline: 96334293 
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A duplicated catalytic motif in a new superfamity of phosphohydrolases andphospholipid synthases that includes pox- 
virus envelope proteins. 
Koonin EV; 

Trends Biochem Sci 1996;21:242-243. 
5 [3]Medline: 94327597 

Cloning and expression of phosphatidylcholine-hydrolyzing phospholipase D from Ricinus communis L. 
Wang X, Xu L, Zheng L; 
J Biol Chem 1994;269:20312-20317. 
[4]Medline: 97386825 

io Regulation of eukaryotic phosphatidylinositol-specific phospholipase C and phospholipase D, 
Singer WD, Brown HA, Sternweis PC; 
Annu Rev Biochem 1997;66:475-509. 
[1154] 416. (PMI typel) Phosphomannose isomerase type I signatures 

Phosphomannose isomerase (EC 5.3.1.8) (PMI) [1,2] is the enzyme that catalyzes the interconversion of mannose- 
is 6-phosphate and f ructose-6-phosphate. In eukaryotes, it is involved in the synthesis of GDP-mannose which is a con- 
stituent of N- and O-linked glycans as well as GPI anchors. In prokaryotes, it is involved in a variety of pathways 
including capsular polysaccharide biosynthesis and D-mannose metabolism. 

Three classes of PMI have been defined on the basis of sequence similarities [1]. The first class comprises all known 
eukaryotic PMI as well as the enzyme encoded by the manA gene in enterobacteria such as Escherichia coli. Class I 
20 PMI's are proteins of about 42 to 50 Kd which bind a zinc ion essential for their activity. 

As signaturo patterns tor class I PMI, two consorved rogions woro soloclod. Tho first ono is located In tho N-torminol 
section of these proteins, the second in the C-terminal half. Both patterns contain a residue involved [3] in the binding 
of the zinc ion. 

[1155] Consensus pattern: Y-x-D-x-N-H-K-P-E [E is a zinc ligand] 

25 

Consensus pattern: H-A-Y-[LIVM]-x-G-x(2)-ILIVM]-E-x-M-A-x-S-D-N-x-[LIVM]-R-A-G-x-T-P-K [H is a zinc ligand] 

( 1] Proudfoot A.E.I. , Turcatti G., Wells T.N.C., Payton M.A., Smith D.J. Eur J. Biochem. 219:415-423(1994). 
[ 2] Coulin R, Magnenat E., Proudfoot A.E.I., Payton M.A., Scully P., Wells TN.C. Biochemistry 32:14139-14144 
30 (1993). 

[ 3] Cleasby A., Wonacott A., Skarzynski T., Hubbard R.E., Davies G.J., Proudfoot A.E.I., Bernard A.R., Payton 
M.A., Wells TN.C. Nat. Struct. Biol. 3:470-479(1996). 

[1156] 417. (PNP UDP 1) Purine and other phosphorylases family 1 signature 
35 The following phosphorylases belongs to the same family: 

Purine nucleoside phosphorylase (EC 2.4.2.1 ) (PNP) from most bacteria (gene deoD). This enzyme catalyzes the 
cleavage of guanosine or inosine to respective bases and sugar-1 -phosphate molecules [1]. 
Uridine phosphorylase (EC 2.4.2.3) (UdRPase) from bacteria (gene udp) and mammals. Catalyzes the cleavage 
40 of undine into uracil and ribose-1 -phosphate. The products of the reaction are used either as carbon and energy 

sources or in the rescue of pyrimidine bases for nucleotide synthesis [2]. 

5'-melhylthioadenosine phosphorylase (EC 2.4.2.28) (MTA phosphorylase) from Sulfolobus solfataricus [3]. 

As a signature pattern, a conserved region was selected in the central part of these enzymes. 
45 [1157] Consensus pattern: [GST]-x-G-[LIVM]-G-x-[PA]-S-x-(GSTA]-l-x(3)-E-L 

Note: it ehoudl be noted that mammalian and somo baclorial PNP a6 woll as oukfiryotic MTA pho6phorylf)G0 bolong 
to a different family of phosphorylases (see <PDOC00954>). 

SO [1] Takehara M., Ling R, Izawa S., Inoue Y., Kimura A. Biosci. Biotechnol. Biochem. 59:1987-1990(1995). 

[ 2] Watanabe S.-l., Hino A., Wada K., Eliason J.R, Uchida T. J. Biol. Chem. 270:12191-12196(1995). 
[ 3] Cacciapuoti G., Porcelli M., Bertoldo C, De Rosa M., Zappia V. J. Biol. Chem. 269:24762-24769(1994). 

[1158] 418. (PP2C) Protein phosphatase 2C signature 
55 Protein phosphatase 2C (PP2C) is one of the four major classes of mammalian serine/threonine specific protein phos- 
phatases (EC 3.1.3.16). PP2C (1] Is a monomorlc onzyrno of t*boul 42 Kd which uhowa broud oubotmlo opucinuily 'tind 
is dependent on divalent cations (mainly manganese and magnesium) for its activity. Its exact physiological role is still 
unclear. Three isozymes are currently known In mammals: PP2C-alpha, -beta and *gamma\ In yeast, (hare are at Idas! 
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is 


four PP2C homologs: phosphatase PTC1 [2] which has weak tyrosine phosphatase activity in addition to its activity 
on serines, phosphatases PTC2 and PTC3, and hypothetical protein YBR125c. Isozymes ol PP2C are also known 
Irom Arabfdopsis thaliana (ABI1, PPH1), Caenorhabditis elegans (FEM-2, F42G9.1, T23F11.1). Leishmania chagas. 

unci Pnrnmocium lolrnurolin. 

In AinljKluphUi lltnllMtwt, tl to kinn«o nHnoulnlud piolnln phonphnliioo (KAPP) |3| lo nn on/ymo thfil dophoaphorylntoe 
the Ser/Thr receptor-like kinase RLK5 and which contains a C-terminal PP2C domain. 

PP2C does not seem to be evolutionary related to the main family of serine/ threonine phosphatases: PP1 , PP2A and 
PP2B However, it is significantly similar to the catalytic subunit of pyruvate dehydrogenase phosphatase (EC 3. 1 .3.43) 
(PDPC) [4] which catalyzes dephosphorylation and concomitant reactivation of the alpha subunit of the E1 component 
ol Urn pyruwilo dohydiogorweo complox. PDPC is a mitochondrial onzymo and, liko PP2C, is magnesium-dependent. 
As h Glgntituro pattern, tho bost consorvod roglon was soloclod which is located in tho N-terminal part and contains a 
perfectly conserved tripeptide. This region includes a conserved aspartate residue involved in divalent cation binding 

[5]- 

[1159] Consensus pattern: [LIVMFY]-[LIVMFYAHGSACHLIVM]-[FYC]-D-G-H-lGAV] 

- Nolo: PP2C bolongs [6] to a suporlamily which also includes bacterial proteins such as Bacillus spollE, rsbU and 
rtibW, SynochocyotiQ PCC 6603 ictG ao well as a domain in fungal adenylate cyclases, 

( 1] Wenk J., Trompeter H.-L Petlrich K.-G., Cohen P.T.W., Campbell D.G.. Mieskes G. FEBS Lett. 297:135-138 
20 (1992). 

[ 2] Maeda T. t Tsai A.Y.M.. Saito H. Mol. Cell. Biol. 13:5408-5417(1993). 

| 3] Stone J M , Collingo M.A., Smith R.D.. Horn M.A., Walker J.C. Science 266:793-795(1994). 
[ 4] Lawson J.E., Niu X.-D., Browning K.S., Trong H.L., Yan J., Reed L.J. Biochemistry 32:8987-8993(1993). 
[ 5] Das A.K., Helps N.R.. Cohen P.T.W., Barford D. EMBO J. 24:6798-6809(1996). 
25 [ 6) Bork P., Brown N.P., Hegyi H. ( Schultz J. Protein Sci. 5:1421-1425(1996). 

[1160] 419 (PPTA) Protein prenytlransf erases alpha subunit repeal signature 

Protein pronyllmnslorasoe catalyze the transfer of an isoprenyl moiety to a cysteine four residues from the C-terminus 
of several proteins. They are helerodimeric enzymes consisting of alpha and beta subunits. The alpha subunit is thought 
30 to participate in a stable complex with the isoprenyl substrate; the beta subunit binds the peptide substrate. Distinct 
protein prenyltransferases might share a common alpha subunit. Both the alpha and beta subunit show repetitive 
sequence motifs [1 ]. These repeats have distinct structural and functional implications and are unrelated to each other. 
Known protein prenyltransterase alpha subunits are: 

35 - Mammalian protein tarnesyltransterase alpha subunit. 

Yeast protein RAM2, a protein farnesyltransferase alpha subunit. 
Yeast protein BET4, a protein geranylgeranyltransferase alpha subunit. 

The conserved domain of the alpha subunit consists of about 34 amino acids and is repeated five times. It contains 
40 an invariant tryptophan possibly involved in heterodimerization with the conserved phenylalanines in the repeated 
domains of the beta subunits, via hydrophobic bonds. The signature pattern for this domain is centered on the invariant 
tryptophan. 

[1161] Consensus pattern: [PSIAV]-x-[NDFV]-[NEQIY]-x-[LIVMAGP]-W-[NQSTHF]-[FYHQ]-[LIVMR] 
[1162] [ 1] Boguski M.S., Murray A.W., Powers S. New Biol. 4:408-411(1992). 

45 [1163] 420. (PR55) Protein phosphatase 2A regulatory subunit PR55 signatures 

Protein phosphatase 2A (PP2A) is a serine/threonine phosphatase involved in many aspects of cellular function in- 
cluding the regulation of metabolic enzymes and proteins involved in signal transduction. PP2A is a trimenc enzyme 
that consists of a core composed of a catalytic subunit associated with a 65 Kd regulatory subunit (PR65), also called 
subunit A this complex then associates with a third variable subunit (subunit B), which confers distinct properties to 

so the holoenzyme [1] One of the forms of the variable subunit is a 55 Kd protein (PR55) which is highly conserved in 
mammals - where three isoforms are known to exist -. Drosophila and yeast (gene CDC55). This subunit could perform 
a substrate recognition function or be responsible for targeting the enzyme complex to the appropriate subcellular 
compartment. 

As signature patterns, two perfectly conserved sequences of 15 residues were selected; one located in the N-termmal 
55 region, the other in the center of the protein. . 

[1164] Consensus pattern: E-F-D-Y-L-K-S-L-E-l-E-E-K-l-N 

Consensus pattern: N-[AG].-H-[TA]-Y-H-I-N-S-I-S-[LIVM]-N-S-D 

[1165] [ 1] Mayer-Jaekel R., Hemmings B.A. Trends Cell Biol. 4:287-291(1994). 
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[1166] 421. N-(5'phosphoribosyl)anthranilate (PRA) isomerase 

[1] Wilmanns M, Priestle JP, Niermahn T, Jansonius JN; 

J Mol Biol 1992;223:477-507. 

[1167] 422. (PRK) Phosphoribulokinase signature 
s Phosphoribulokinase (EC 2.7.1.19) (PRK) [1,2] is ono of the enzymes specific to the Calvin's reductivo pontoso phos- 

phate cycle which is the major route by which carbon dioxide is assimilated and reduced by autotrophic organisms. 

PRK catalyzes the ATP-dependent phosphorylation of ribulose 5-phosphate into ribulose 1 ,5-bisphosphate which is 

the substrate for RubisCO. PRK's of diverse origins show different properties with respect to the size of the protein, 

the subunit structure, or the enzymatic regulation. However an alignment of the sequences of PRK from plants, algae, 
to photosynthetic and chemoautotrophic bacteria shows that there are a few regions of sequence similarity. As asignature 

pattern one of these regions was selected. 

[1168] Consensus pattern: K-[LIVM]-x-R-D-x(3)-R-G-x-[ST]-x-E 

[ 1] Kossmann J., Klintworth R., Bowien B. Gene 85:247-252(1989). 
is [ 2] Gibson J.L.,.Chen J.-H., Tower P.A., Tabita F.R. Biochemistry 29:8085-8093(1990). 

[1169] 423. (PRPP synt) Phosphoribosyl pyrophosphate synthetase signature 

Phosphoribosyl pyrophosphate synthetase (EC 2.7.6. 1 ) (PRPP synthetase) catalyzes the formation of PRPP from ATP 
and ribose 5-phosphate. PRPP is then used in various biosynthetic pathways, as for example in the formation of purines, 
20 pyrimidines, histidine and tryptophan. PRPP synthetase requires inorganic phosphate and magnesium tons for its 
stability and activity. 

In mammals, three isozymes of PRPP synthetase are found; in yeast there are at least four isozymes. 
As a signature pattern for this enzyme, a very conserved region was selected that has been suggested to be involved 
in binding divalent cations [1]. This region contains two conserved aspartic acid residues as well as a histidine, which 
25 are all potential ligands for a cation such as magnesium. 

[1170] Consensus pattern: D-[LI]-H-[SA]-x-Q-[IMST]-[QM]-G-[FY]-F-x(2)-P-[LIVMFC]-D 

[1171] [ 1] Bower S.G., Harlow K.W, Switzer R.L., Hoven-Jensen B. J. Biol. Chem. 264:10287-10291(1989). 

[1172] 424. (PRTP) Herpesvirus processing and transport protein 

The members of this family are associate with capsid intermediates during packaging ot the virus. 
30 Number of members: 31 
[1] 

Medline: 98362148 

Herpes simplex virus type 1 cleavage and packaging proteins 
UL15 and UL28 are associated with B but not C capsids during 
35 packaging. Yu D, Weller SK; 

J Virol 1998;72:7428-7439. 
[1173] 425. Photosystem I psaG / psaK (PSI PSAK) proteins signature 

Photosystem I (PSI) [1] is an integral membrane protein complex that uses light energy to mediate electron transfer 
from plastocyanin to f erredoxin. It is found in the chloroplasts of plants and cyanobacteria. PSI is composed of at least 
40 14 different subunits, two of which PSI-G (gene psaG) and PSI-K (gene psaK) are small hydrophobic proteins of about 
7 to 9 Kd and evolutionary related [2]. Both seem to contain two transmembrane regions. Cyanobacteria seem to 
encode only for PSI-K. 

[1174] As a signature pattern, the best-conserved region was selected which seems to correspond to the second 
transmembrane region. 

- Consensus pattern: [GT]-F-x-[LIVM]-x-[DEA]-x(2)-[GA]-x-[GTA]-[SA]-x-G-H-x-[LIVM]-[GA] 
[1] Golbeck J.H. Biochim. Biophys. Acta 895:167-204(1987). 

[2] Kjaerulff S., Andersen B., Nielsen VS., Moller B.L, Okkels J.S. J. Biol. Chem. 268:18912-18916(1993). 

£0 

[1175] 426. PTR2 family proton/oligopeptide symportors signatures 

A family of eukaryotic and prokaryotic proteins that seem to be mainly involved in the intake of small peptides with the 
concomitant uptake of a proton has been recently characterized [1,2]. Proteins that belong to this family are: - Fungal 
peptide transporter PTR2. 

55 

Mammalian intostlno proton-dopondont ollpopoptldo transporter PoptTI, I 
Mammalian kidney proton-dependent oligopeptide transporter PeptT2. 
Drosophila bptl 
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- Arabidopsis thaliana peptide transporters PTR2-A and PTR2-B (also known as the histidine transporting protein 

NTR1). ' 
Arabiidopsis thaliana proton-dependent nitrate/chlorate transporter CHL1 . 
Laclococcus proton-dopendont di- and tri-peptide transporter dtpT. 

- Caonorhabditic otogane hypothetical protein C06G8.2. ■ < 

- . Caenorhabditis elegans hypothetical protein F56F4.5, 

Caenorhabditis elegans hypothetical protein K04E7.2. 
Escherichia coli hypothetical protein ybgH. . 
Escherichia coli hypothetical protein ydgR. 
to - Escherichia coli hypothetical protoin yhiP. 

Escherichia coli hypothetical protein yjdL 
Bacillus subtilis hypothetical protein yclF. 

These integral membrane proteins are predicted to comprise twelve transmembrane regions. As signature patterns, 
is two of the best conserved regions were selected. The first is a region that includes the end of the second transmembrane 
region, a cytoplfiomic loop as woll as the third transmembrane region. The second pattern corresponds to the core of 
the fifth transmembrane region. 

- Consensus patlern:lGA]-[GASHLIV^^ 
20 [GSTAV]-x-[LIVMF]-x(3)-lGA] 

- Consensus pattern: [FYT]-x(2)-[LMFY]-[FYV]-[LIVMFYWA]-x-[IVGl-N-[LIVMAG]-G-[GSAHLIMF] 

[ 1] Paulsen I.T., Skurray R.A. Trends Biochem. Sci. 19:404-404(1994). 
[ 2] Steiner H.-Y. t Naider F., Becker J.M. Mol. Microbiol. 16:825-834(1995). 

25 

[1176] 427. Pumilio-family RNA binding domains (aka PUM-HD, Pumilio homology domain) 
Puf domains are necessary and sufficient for sequence specific 

RNA bindinn in fly Pumilio and worm FBF-1 and FBF-2. Both proteins function as translational repressors in early 
embryonic development by binding sequences in tho 3' UTR of target mRNAs (e.g. the nanos response element (NRE) 
30 in fly Hunchback mRNA, or the point mutation element (PME) in worm fem-3 mRNA). Other proteins that contain Puf 
domains are also plausible RNA binding proteins. JSN1_YEAST, for instance, appears to also contain a single RRM 
domain by HMM analysis. 

Puf domains usually occur as a tandem repeat of 8 domains. 

The Pfam modol doos not necessarily recognize all 8 domains in all sequences; some sequences appear to have 5 
35 or 6 domains on initial analysis, but further analysis suggests the presence of additional divergent domains. 
[1177] [1] Zhang B, Gallegos M, Puoti A, Durkin E, Fields S, Kimble J, 

Wickens MP. Nature 1997;390:477-484. [2] Zamore PD, Williamson JR, Lehmann R. RNA 1997;3:1421-1433. 
[1178] 428. PWWP domain. The PWWP domain is named after a conserved Pro-Trp-Trp-Pro motif. The function of 
the domain is currently unknown. Number of members: 19 
40 [1179] [1] Medline: 98282232. WHSC1 , a 90 kb SET domain-containing gene, expressed in early development and 
homologous to a Drosophila dysmorphy gene maps in the Wolt-Hirschhorn syndrome critical region and is fused to 
IgH in t(4;14) multiple myeloma. Stec I, Wright TJ, van Ommen GJB, de Boer PAJ, van Haeringen A, Moorman AFM, 
Altherr MR, den Dunnen JT; Hum Mol Genet 1998;7:1071-1082. 
[1180] 429. PX domain 

4£ Eukaryotic domain of unknown function present in phox proteins, PLD isoforms, a PI3K isoform. 
Number of members: 71 
11] 

Medline: 97084820 

Novel domains in NADPH oxidase subunits, sorting nexins, and 
so Ptdlns 3-kinases: binding partners of SH3 domains? 
Ponting CP; 
Protein Sci 1996;5:2353-2357. 
[t181] .430. ParA family ATPase 

•m 

55 Medline: 91141297 

A family of ATPases involved in active partitioning of diverse bacterial plasmids. 
Motallebi-Veshareh M, Rouch DA, Thomas CM; 
Mol Microbiol 1990;4:1455-1463. 
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Number of members: 122 

[1182] 431 . (Parvo coat) Parvovirus coat protein. 72 members. 
[1183] 432, Pectinesterase signatures 

Pectinesterase (EC 3.1,1 .11) (pectin methylesterase) catalyzes the hydrolysis of pectin into pectate and methanol. In 
s plants, it plays an important role in cell wall metabolism during fruit ripening. In plant bacterial pathogens such as 
Erwinia carotovora and in fungal pathogens such as Aspergillusniger, pectinesterase is involved in maceration and 
soft-rotting of plant tissue. 

Prokaryotic and eukaryotic pectinesterases share a few regions of sequence similarity [1,2.3]. two of these regions 
were selected as signature patterns. 
10 The first is based on a region in the N-terminal section of these enzymes; it contains a conserved tyrosine which may 
play a role in the catalytic mechanism [3]. The second pattern corresponds to the best conserved region, an octapeptide 
located in the central part of these enzymes. 

- Consensus pattern: [GSTNP]-x(6)-[FYVHR]-[IVN]-[KEP]-x-G^STIVKRQ]-Y-[DNQKRMv]-[EP]-x(3)-[LIMVA] 
is - Consensus pattern: [IV]-x-G-[STAD]-[LIVT]-D-[FYI]-[IV]-[FSN]-G 

[ 1] Ray J., Knapp J., Grierson D., BirdC, Schuch W. Eur. J. Biochem. 174:119-124(1988). 

[ 2] Plastow G.S. MoL Microbiol. 2:247-254(1988). 

[ 3] Markovic O.; Joernvall H. Protein Sci. 1:1288-1292(1992). 

20 

[1184] 433. Pentapeptide repeats (8 copies) 

These repeats are found in many cyanobacterial proteins. 

The repeats were first identified in hglK [1]. The function of these repeats is unknown. 
The structure of this repeat has been predicted to be a beta-helix [2]. 
2$ The repeat can be approximately described as A(D/N)LXX. where X can be any amino acid. Number of members: 75 

HI 

Medline: 96062225 

The hglK gene is required for localization of heterocyst-specific glycolipids in the cyanobacterium 
Anabaena sp. strain PCC 7120. 
30 Black K, Buikema WJ, Haselkorn R; 

J Bacteriol 1 995; 1 77:6440-6448. 
[2]Medline: 98318059 
Structure and distribution of pentapeptide repeats in bacteria. 
Bateman A, Murzin A, Teichmann SA; 
35 Protein Sci 1998;7:1477-1480. 

[3]Medline: 98316713 

Characterisation of an Arabidopsis cDNA encoding a thylakoid lumen protein related to a novel 'pentapeptide repeat' 
family of proteins. 
Kieselbach T, Mant A, Robinson C, Schroder WP; 
40 FEBS Lett 1998;428:241-244. 

[1185] 434. Polypeptide deformylase 
[1] 

Medline: 97002011 

A n w subclass of the zinc metal lop rot eases superfamily revealed by the solution structure of peptido deformylase. 
^5 MeinnelT, Blanquet S, Dardel F; 

J Mol Biol 1996;262:375-386. 
[2]Medline: 98332750 
Solution structure of nickel-peptide deformylase. 
Dardel F, Ragusa S, Lazennec C, Blanquet S, Meinnel T; 
50 J Mol Biol 1998;280:501-513. 

Number of members: 21 

[1186] 435. Peptidyl-tRNA hydrolase signatures 

Peptidyl-tRNA hydrolase (EC 3.1.1.29) (PTH) is a bacterial enzyme that cleaves peptidyl-tRNA or N-acyl-aminoacyl- 
tRNA to yield free peptides or N-acyl-amino acids and tRNA. The natural substrate for this enzyme may be peptidyl- 
55 tRNA which drop off the ribosome during protein synthesis [1,2]. Bacterial PTH has been found [2,3] to be evolutionary 
related to yeast hypothetical protein YHR189W. ' 
PTH and YHR189w are proteins of about 200 amino acid residues, A6 signature patterns, two conserved regions were 
selected that each contain an histidine. The first of these regions is located in the N-termlnal section, the other In the 
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central part. 

- Concensus pattern: [FY]-x(2)-T-R-H-N-x-G-x(2)-[LIVMFA](2)-[DEj 

- Concensus pallom: IGS].x(3)-H-N-G-[LIVM]-|KR]-[DNSHLIVMT] 

[ 1] Garcia-Villegas M.R., De La Vega F.M., Galindo J.M., Segura M., Buckingham R.H., Guarneros G. EMBO J. 
10:3549-3555(1991). 

[ 2] De La Vega F.M., Galindo J. M., Old I.G.. Guarneros G. Gene 169:97-100(1996). 
[ 3] Ouzounis C, Bork P., Casari G., Sander C. Protein Set. 4:2424-2428(1995). 

to 

[1187] 436. (Peptidase M17) Cytosol amlnopeplidase signature 

Cytosol aminopeptidase is a eukaryotic cytosolic zinc-dependent exopeptidase that catalyzes the removal ot unsub- 
otllutod Hrnlno-fldd rosiduos Irom the N-torminus ol proteins. This enzyme is often known as leucine aminopeptidase 
(EC 3.4.11.1) (LAP) but has been shown [1] to be identical with prolyl aminopeptidase (EC 3.4.11.5): Cytosol ami- 
is nopeptidase is a hexamer ol identical chains, each ot which binds two zinc ions. 

Cytosol aminopeptidase is highly similar to Escherichia coli pepA, a manganese dependent aminopeptidase. Residues 
involved in zinc ion-binding [2] in the mammalian enzyme are absolutely conserved in pepA where they presumably 
bind manganese. 

A cylosol aminopoplidaso from Rickettsia prowazekii [3] and one from Arabidopsis thaliana also belong to this family 
20 As a signature pattern for these enzymes, a perfectly conserved octapoplidc was selected which contains two residues 
involved in binding metal ions: an aspartate and a glutamate. 

Consensus pattern: N-T-D-A-E-G-R-L [The D and the E are zinc/manganese ligands] 
Note: these proteins belong to family M17 in the classification of peptidases [4,E1]. 
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[ 1] Matsushima M., Takahashi T., Ichinose M„ Miki K., Kurokawa K. t Takahashi K. Biochem. Biophys. Res. Com- 
mun. 178:1459-1464(1991). 

I 2] Burloy S.K., David P.R., Sweet R.M., Taylor A. t Lipscomb W.N. J. Mol. Biol. 224:113-140(1992). 
[ 3] Wood D.O., Solomon M.J., Speed R.R. J. Bacteriol. 175:159-165(1993). 
30 [ 4] Rawlings N.D., Barrett A.J. Meth. Enzymol. 248:183-228(1995). 

[1188] 437. Assemblin (Peptidase family S21) 
[1] 

Modlino: 96399137 
35 Three-dimensional structure of human cytomegalovirus protease. 

Shieh HS, Kurumbail RG Stevens AM, Stegeman RA, Sturman EJ, 
Pak JY, Wittwer AJ, Palmier MO, Wiegand RC, Holwerda BC, 
Stallings WC; 
Nature 1 996;383:279-282. 
40 Number of members: 29 

[1189] 438. Pollen proteins Ole e I family signature 

The following plant pollen proteins, whose biological function is not yet known, are structurally related [1]: 

Olive tree pollen major allergen (Ole el). 
45 - Tomato anthor-spocific protoin LAT52. - Maize pollen-specific protein 2mC13. Those proteins are most probably 
secreted and consist of about 145 residues. As shown in the following schematic representation, there are six 
cysteines which are conserved in the sequence of these proteins. They seem to be involved in disulfide bonds. 
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xxxxxxCxCxxxxxxxxxCxxxxxxxxxxxxxxxxxCxxxxxCxxxxxxxxxxxxxxxxxxxxCxxxxxxx 
****** conserved cysteine involved in a disulfide bond. 
ss position of the pattern. 

- Consensus pattern: [EQ]-G-x-V-Y-C-D-T-C-R [The two C's are probably involved in disulfide bonds] 
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[1190] [ 1] Villalba M., Batanero E., Lopez-Otin C, Sanchez L.M., Monsalve R.I., Gonzalez De La Pena M.A., Lahoz 
C. Rodriguez R. Eur. J. Biochem. 216:863-869(1993). 
[1191] 439. Pollen allergen 

This family contains allergens lol PI, Pll and Pill from Lolium perenne. 
5 Number of members; 49 
[1] 

Medline: 90105394 

Complete primary structure of a Lolium perenne (perennial rye grass) pollen allergen, Lol p III: comparison with known 
Lol p I and II sequences. 
io Ansari AA, Shenbagamurthi P, Marsh DG; 
Biochemistry 1989;28:8665-8670. 
[1192] 440. Porphobilinogen deaminase cofactor-binding site 

Porphobilinogen deaminase (EC 4.3.1 .8), or hydroxymethylbilane synthase, is an enzyme involved in the biosynthesis 
of porphyrins and related macrocycles. It catalyzes the assombly of four porphobilinogen (PBG) units in a hoad to tall 
is fashion to form hydroxymethylbilane. 

The enzyme covalently binds a dipyrromethane cofactor to which the PBG subunits are added in a stepwise fashion. 
In the Escherichia coli enzyme (gene hemC), this cofactor has been shown [1] to be bound by the sulfur atom of a 
cysteine. The region around this cysteine is conserved in porphobilinogen deaminases from various prokaryotic and 
eukaryotic sources. 

20 

- Consensus pattern: E-R-x-[LIVMFA]-x(3)-[LIVMF]-x-G-[GSA]-C-x-[l VT]-P-[LIVMF) 
-[GSA] |C is the cofactor attachment site] 

[1193] [ 1] Miller A.D., Hart G.J., Packman L.C., Battersby A.R. Biochem. J. 254:915-918(1988). 
25 [1194] 441. Presenilin 

Mutations In presenllln-1 are a major cause of early onset Alzheimer's disease [2]. It has boon found that prosonllln-l 
(Swiss: P4 9768) binds to beta-catenin in vivo [4]. This family also contains SPE proteins from C.elegans. 
Number of members: 23 
[1] 

30 Medline: 98045995 

Presenilins and Alzheimer's disease. 
Kim TW, Tanzi RE; 

CurrOpin Neurobiol 1997;7:683-688. 
[2]Medline: 98045995 
35 Presenilins and Alzheimer's disease. 
Kim TW, Tanzi RE; 

Curr Opin Neurobiol 1 997;7:683-6B8. 
[3]Medline: 98099802 
Interaction of presenilins with the lilamin family of actin-binding proteins. 
40 Zhang W, Han SW, McKool DW, Goato A, Wu JY; 
J Neurosci 1998;18:914-922. 
[4]Medline: 99004850 

Destabilisation of beta-catenin by mutations in presenilin-1 potentiates neuronal apoptosis. 

Zhang Z, Hartmann H, Do VM, Abramowski D, Sturchler-Pierrat 
45 c, Staufenbiel M, Sommer B, van de Wetering M, Clevers H, 
Saftig P, De Strooper B, He X, Yanknor BA; 

Nature 1998;395:698-702. 
[1195] 442. (Pribosyltran) Purine/pyrtmidine phosphoribosyl transferases signature 

Phosphoribosyltransferases (PRT) are onzymos that catalyzo tho 6ynlhosls of bola-n-5'-monophosphatoG from phoe- 
50 phoribosylpyrophosphate (PRPP) and an enzyme specific amine. A number of PRT's are involved in the biosynthesis 
of purine, pyrimidine, and pyridine nucleotides, or in the salvage of purines and pyrimidines. These enzymes are: 

Adenine phosphoribosyltransferase (EC 2,4.2.7) (APRT), which is involved in purine salvage. 
Hypoxanthine-guanine or hypoxanthine phosphoribosyltransferase (EC 2.4.2,8) (HGPRT or HPRT), which are 
55 involv d in purino salvage. 

Orotate phosphoribosyltransferase (EC 2.4.2.10) (OPRT), which is involved in pyrimidine biosynthesis. 
Amido phosphoribosyltransferase (EC 2.4.2.14), which is involved in purine biosynthesis. 
Xanthine-guanine phosphoribosyltransferase (EC 2.4.2.22) (XGPRT), which is involved in purine salvage. 
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In the sequence of all these enzymes there is a small conserved region which may be involved in the enzymatic activity 
and/or be part of the PRPP binding site [1]. 

- Consensus pattern: [LIVMFYWCTAHLIVMHLIVMA^ 
5 x-[STAR] 

- - Note: in position 11 of the pattern most of these enzymes have Gly. 

[1196] |1] Horshoy H.V., Taylor M.W. Gene 43:287-293(1986). 
[1197] 443. (Pro CA) 

io Prokaryotic-lypo carbonic finhydraooB signatures 

Carbonic anhydraooB (EC 4.2.1.1) (CA) nro zinc motalloonzymos which calalyzo tho rovorsiblo hydration of carbon 
dioxide In Escherichia coli, CA (gene cynT) is involved in recycling carbon dioxide formed in the bicarbonate-dependent 
decomposition of cyanate by cyanase (gene cynS). By this action, it prevents the depletion of cellular bicarbonate [1]. 
In photosynthetic bacteria and plant chloroplast, CA is essential to inorganic carbon fixation [2]. Prokaryotic and plant 

is chloroplast CA are structurally and evolutionary related and form a family distinct from the one which groups the many 
(lifl,Mont inarm ol nukaryollc CA'o <r;oo <PDOC001 4G>). Hypothetical protoins yadF from Escherichia coli and HI1301 
Irom Haemophilus inlluonzao also belong to Into lamlly. Two oignaturo palloino woro dovolopod lor this family of on- 
zymes. Both patterns contain conserved residues that could be involved in binding zinc (cysteine and histidine). 

20 - Consensus pattern: C-[SA]-D-S-R-[UVM]-x-[AP] 

- Consensus pattern: [EQ]-Y-A-[LIVIv1]-x(2)-[LIVM]-x(4)-[LIVMF](3)-x-G.H-x(2)-C-G 

[ 1] Guilloton M.B.. Korte J.J., Lamblin A.F., Fuchs J.A., Anderson P.M. J. Biol. Chom. 267:3731-3734(1992). 
[ 2] Fukuzawa H., Suzuki E., Komukai Y., Miyachi S. Proc. Natl. Acad. Sci. U.S.A. 89:4437-4441(1992). 
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[1198] 444. (ProlyLoligopep) 

Prolyl oligopeptidase family serine active site 

[1199] Tho prolyl oligopoptidase family [1.2,3] consist of a number of evolutionary related peptidases whose catalytic 
activity seems to be provided by a charge relay system similar to that of the trypsin family of serine proteases, but 
30 which evolved by independent convergent evolution. The known members of this family are listed below. 

- Prolyl endopeptidase (EC 3.4.21 .26) (PE) (also called post-proline cleaving enzyme). PE is an enzyme that cleaves 
peptide bonds on the C-terminal side of prolyl residues. The sequence of PE has been obtained from a mammalian 
species (pig) and from bacteria (Flavobactorium meningosepticum and Aeromonas hydrophila); there is a high 

35 degree of sequence conservation between these sequences. 

- Escherichia coli protease II (EC 3.4.21 .83) (oligopeptidase B) (gene prtB) which cleaves peptide bonds on the C- 
terminal side of lysyl and argininyl residues. 

- Dipeptidyl peptidase IV (EC 3.4.14.5) (DPP IV). DPP IV is an enzyme that removes N-termmal dipeptides sequen- 
tially from polypeptides having unsubstituted N-termini provided that the penultimate residue is proline. 

40 - Yeast vacuolar dipeptidyl aminopeptidase A (DPAP A) (gene: STE13) which is responsible for the proteolytic mat- 
uration of the alpha-factor precursor. 

Yeast vacuolar dipeptidyl aminopeptidase B (DPAP B) (gene: DAP2). 

- Acylamino-acid-releasing enzyme (EC 3.4.19.1) (acyl-peptide hydrolase). This enzyme catalyzes the hydrolysis 
of the amino-terminal peptide bond of an N-acetylated protein to gene/ate a N-acetylated amino acid and a protein 

45 with a troo amino-torminus. 

[1200] A conserved serine residue has experimentally been shown (in E.coli proteasell as well as in pig and bacterial 
PE) to be necessary for the catalytic mechanism. This serine, which is part of the catalytic triad (Ser, His, Asp), is 
generally located about 150 residues away from the C-terminal extremity of these enzymes (which are all proteins that 
so contains about 700 to 800 amino acids). 

[1 201] Consensus pattern: D-x(3)-A-x(3)-[LI VMFYW]-x(1 4)-G-x-S-x-G-G-[LI VMFYW](2) [S is the active site residue] 
Sequences known to belong to this class detected by the pattern ALL, except for yeast DPAP A. 

55 
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[1202] Note: these proteins belong to families S9A/S9B/S9C in the classification of peptidases [4]. 

[ 1] Rawlings N.D., Polgar L, Barrett A.J. Biochem. J. 279:907-911(1991). 
[ 2] Barrett A.J., Rawlings N.D. 
£ [ 3] Polgar L, Szabo E. 

[ 4] Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:19-61(1994). 

[1203] 445. (Pterin 4a) 
Pterin 4 alpha carbinolamine dehydratase 
to [1204] Pterin 4 alpha carbinolamine dehydratase is aka DCoH (dimorisatlon colactor of hepatocyto nucloar (actor 
1 -alpha). 

[1205] Number of members: 11 

[1206] [1] Cronk JD, Endrizzi JA, Alber T; Medline: 97052967 High-resolution structures of the bifunctional enzyme 
and transcriptional coactivator DCoH and its complex with a product analogue." Protein Sci 1996;5:1963-1972. 
*s [1207] 446. (Pyridox oxidase) 

Pyridoxamine S'-phosphate oxidase signature 

[1208] Pyridoxamine'S'-phosphate oxidase (EC 1.4.3.5) is a FMN flavoprotein involved in the de novo synthesis of 
pyridoxine (vitamin B6) and pyridoxal phosphate. It oxidizes pyridoxamine-5-P (PMP) and pyridoxine-5-P (PNP) to 
pyridoxal-5-P. The sequences of the enzyme from bacteria! (genes pdxH or IprA) [ 1 ] and fungal (gene PDX3) [2] sources 
20 show that this protein has been highly conserved throughout evolution. 

PdxH is evolutionary related [3] to one of the enzymes in the phonazino biosynthesis protein pathway, phzD (also 
known as phzG). As a signature pattern, a highly conserved region was selected located in the C-terminal part of these 
enzymes. 

25 - Consensus pattern: [LIVF]-E-F-W-[QHG]-x(4)-R-[LIVM]-H-[DNE]-R 
[ 1] Lam H.-M., Winkler M.E. J. Bacteriol. 174:6033-6045(1992). 

[ 2] Loubbardi A., Karst F., Guilloton M., Marcireau C. J. Bacteriol. 177:1817-1823(1995). 
[ 3] Pierson LS. Ill, Gaffney T., Lam S., Gong F. FEMS Microbiol. Lett. 134:299-307(1995). 

30 

[1209] 447. (Pyrophosphatase) 
Inorganic pyrophosphatase signature 

[1210] Inorganic pyrophosphatase (EC 3.6.1.1) (PPase) [1,2] is the enzyme responsible for the hydrolysis of pyro- 
phosphate (PPi) which is formed principally as the product of the many biosynthetic reactions that utilize ATP. All known 

35 Ppases require the presence of divalent metal cations, with magnesium conferring the highest activity. Among other 
residues, a lysine has been postulated to be part or close to the active site. PPases have been sequenced from bacteria 
such as Escherichia colt (homohexamer), thermophilic bacteria PS-3 and Thermus thermophilus, Irom the archaebac- 
teria Thermoplasma acidophilum, from fungi (homodimor), from a plant, and from bovino rotlna. In yoast, h mitochon- 
drial isoform of PPase has been characterized which seems to be involved in energy production and whose activity Is 

40 stimulated by uncouplers of ATP synthesis. 

[1211] The sequences of PPases share some regions of similarities. As signature patterns a region was selected 
that contains three conserved aspartates that are involved in the binding of cations. 

- Consensus pattern: D-ISGDN]-D-[PE]-[LIVMF]-D-[LIVMGAC] 

45 

[The three D's bind divalent metal cations] 

[ 1] Lahti R. ( Kolakowski L.F. Jr., Heinonen J., Vihinen M., Pohjanoksa K., Cooperman B.S. Biochim. Biophys. Acta 
1038:338-345(1990). 

50 [ 2] Cooperman B.S., Baykov A.A., Lahti R. Trends Biochem. Sci. 17:262-266(1992). 

[1212] 448. (Peptidase S26) 
Signal peptidases I signatures. 

[1213] Signal peptidases (SPases) [1] (aka leader peptidases) remove the signal peptides from secretory proteins. 
55 in prokaryotes three types of SPasesar known: type I (gene lepB) which is responsible for the processing of th 
majority of exported pro-protolns; typo II (gone lap) which only process lipoproteins, tmtl a third lypo involvod in' the 
processing of piii subunits. SPase I (EC 3.4.21 .89) is an integral membrane protein that is anchored in the cytoplasmic 
membrane by one (in B. subtilis) or two (in E. coli) N-terminal transmembrane domains with the main part of the protein 
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proluding in the periplasmic space. Two residues have been shown [2,3] to be essential for the catalytic activity of 
SPase I: a serine and an lysine. SPase I is evolutionary related to the yeast mitochondrial inner membrane protease 
subunit H and 2 (genes IMP1 and IMP2) which catalyze the removal of signal peptides required tor the targeting of 
prolnino from tho mitochondrial matrix, across tho innor mombrano, into tho inter-membrane space [4].ln eukaryotes 

o tho romoval of signal poplidos Is olloclod by an oligomoric onzymatic complex composod of at least five subunits: the 
signal peptidase complex (SPC). The SPC is located in the endoplasmic reticulum membrane. Two components of 
mammalian SPC, the 18 Kd (SPC18) and the 21 Kd (SPC21 ) subunits as well as the yeast SEC11 subunit have been 
shown [5] to share regions of sequence similarity with prokaryotic SPases I and yeast IMP1/IMP2. Three signature 
patterns have been developed for these proteins. The first signature contains the putative active site serine; the second 

to cignaturo contains tho pulativo active site lysine which is not conserved in the SPC subunits, and the third signature 
corresponds to a conserved region ol unknown biological significance which is located in the C-terminal section of all 
these proteins. 

1 1214] Consensus paltorn: [GS)-x-S-M-x-|PS]-[AT]-[LF] (S is an active site residue]- 
Consensus pattern: K-R-[LIVMSTA](2)-G-x-[PG]-G-[DE]-x-[LIVM]-x-[LIVMFY] [K is an active site residue- 
is Consonsus pattern: [LIVMFYW](2)-x(2)-G-D-[NH]-x(3)-[SND]-x(2)-[SG]- 

11216] I 1] Dalboy R.E., von Hoijno G. Tronds Biochom. Sci. 17:474-478(1992). [ 2] Sung M., Dalbey R E. J. Biol. 
Chem. 267:13154-13159(1992).! 3] Black M.T. J. Bacterid. 175:4957-4961 (1993).[ 4] NunnariJ., Fox T.D., Walter P. 
Science 262: 1997-2004(1 993). [ 5] van Dijl J.M., de Jong A., Vehmaanpera J., Venema G., Bron S. EMBO J. 11: 
281 9-2828(1 992).[ 6] Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:19-61 (1 994). [E1] 

20 [1 21 6] 449. (Peptidase C 1 ) Eukaryotic thiol (cysteine) proteases active sites. Eukaryolic thiol proteases (EC 3.4.22.-) 
[1] are a family of proteolytic enzymes which contain an active site cysteine. Catalysis proceeds through a thioester 
intormodiato and is facilitated by a nearby histidino side chain; an asparagine completes the essential catalytic triad. 
The proteases which are currently known to belong to this family are listed below (references are only provided lor 
recently determined sequences). - Vertebrate lysosomal cathepsins B (EC 3.4.22.1 ), H (EC 3.4.22.16), L (EC 

25 3,4 22.15 ), and S (EC 3.4.22.27 ) [2]. - Vertebrate lysosomal dipeptidyl peptidase I (EC 3.4.14.1) (also known as cathe- 
psin C) [2]. - Vertebrate calpains (EC 3.4.22.17 ). Calpains are intracellular calcium- activated thiol protease that contain 
both a N-torminal catalytic domain and a C-terminal calcium-binding domain. - Mammalian cathepsin K, which seems 
involved in osteoclastic bono resorption [3]. - Human cathepsin O [4]. - Bleomycin hydrolase. An enzyme that catalyzes 
the inactivation of the antitumor drug BLM (a glycopeptide). - Plant enzymes: barley aleurain (EC3.4.22.16), EP-B1/B4; 

30 kidney bean EP-C1, rice bean SH-EP; kiwi fruit actinidin (EC 3.4.22.14 ); papaya latex papain (EC 3.4.22.2), chymo- 
papain (EC 3.4.22.6 ), caricain (EC 3.4.22.30 ), and proteinase IV (EC 3.4.22.25 ); pea turgor-responsive protein 15A; 
pineapple stem bromelain (EC 3.4.22.32 ); rape COT44; rice oryzain alpha, beta, and gamma; tomato low-temperature 
induced, Arabidopsis thaliana A494, RD19Aand RD21A. - House-dust mites allergens DerP1 and EurM1 . - Cathepsin 
B-liko protoinasos from tho worms Caonorhabditis ologans (genos gcp-1, cpr-3, cpr-4, cpr-5 and cpr-6), Schistosoma 

35 mansoni (antigen SM31 ) and Japonica (antigen SJ31 ), Haemonchus contortus (genes AC-1 and AC-2), and Ostertagia 
ostertagi (CP-1 and CP-3). - Slime mold cysteine proteinases CP1 and CP2. - Cruzipain from Trypanosoma cruzi and 
brucei. - Throphozoite cysteine proteinase (TCP) from various Plasmodium species. - Proteases from Leishmania 
mexicana. Theileria annulata and Theileria parva. - Baculoviruses cathepsin-like enzyme (v-cath). - Drosophila small 
optic lobes protein (gene sol), a neuronal protein that contains a calpain-like domain. - Yeast thiol protease 

^o BLH1/YCP1/LAP3. - Caenorhabditis elegans hypothetical protein C06G4.2, a calpain-like protein. Two bacterial pepti- 
dases are also part of this family: - Aminopeptidase C from Lactococcus lactis (gene pepC) [5]. - Thiol protease tpr 
from Porphyromonas gingivalis. Three other proteins are structurally related to this family, but may have lost their 
proteolytic activity. - Soybean oil body protein P34. This protein has its active site cysteine replaced by a glycine. - Rat 
testin, a Sertoli cell secretory protein highly similar to cathepsin L but with the active site cysteine is replaced by a 

45 serine. Rat testin should not be confused with mouse testin which is a LIM<Jomain protein (see <PDOC00382>). - 
Plasmodium falciparum serine-repeat protein (SERA), the major blood stage antigen. This protein of 11 1 Kd possesses 
a C-terminal thiol-protease-like domain [6], but the active site cysteine is replaced by a serine. The sequences around 
the three active site residues are well conserved and can be used as signature patterns. 

[1217] Consensus pattern: Q-x(3)-[GE]-x-C-[YW]-x(2)-lSTAGC]-[STAGCV] [C is the active site residue]- Note: the 
so residue in position 4 of the pattern is almost always cysteine; the only exceptions are calpains (Leu), bleomycin hy- 
drolase (Ser) and yeast YCP1 (Ser). -Note: the residue in position 5 of the pattern is always Gly except in papaya 
protease IV where it is Glu. 

[1218] Consensus pattern: [LIVMGSTAN]-x-H-[GSACE]-[LIVM]-x-[LIVMAT](2)-G-x-[GSADNH] [H is the active site 
residue]-. 

55 Consensus pattern: [FYCH]-[WIHUVT]-x-[KRQAG]-N-^ 

is the active site residue] - Note: these proteins belong to family C1 (papain-type) and C2 (calpains) in the classification 
of peptidases [7. E1] .- 

[1219] [ 1] Dufour E. Biochimie 70:1335-1 342(1 988). [ 2] Kirschke H, Barrett A.J., Rawlings N.D. Protein Prof. 2: 
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1587-1643(1995).[3] Shi G.-P, Chapman H.A., Bhairi S.M., Deleeuw C, Reddy V.Y, Weiss S.J. FEBS Lett. 357: 
129t134(1995).[4] Velasco G., Ferrando A.A., Puente X.S., Sanchez L.M., Lopez-Otin C. J. Biol. Chem. 269: 
27136-27142(1994). [ 5] Chapot-Chartier M.R, Nardi M., Chopin M.C., Chopin A., Gripon J.C. Appl. Environ: Microbiol. 
59:330-333(1 993). [ 6] Higgins D.G., McConnell D. J., Sharp P.M. Nature 340:604-604(1 989). [ 7] Rawlings N.D., Barrett 
5 A J. Meth. Enzymol. 244:461-486(1994). 

[1220] 450. (peptidase M24) Aminopeptidase P and proline dipeptidase signature (1). 

Aminopeptidase P (EC 3.4.11.9 ) is the enzyme responsible tor the release of any N-terminal amino acid adjacent to a 
proline residue. Proline dipeptidase(EC 3.4.13.9 ) (prolidase) splits dipeplides with a prolyl residue in the carboxyl 
terminal position. Bacterial aminopeptidase P II (gene pepP) [ 1 ], proline dipeptidase (gene pepQ)[2], and human proline 
io dipeptidase (gene PEPD) [3] are evolutionary related. These proteins are manganese metalloenzymes. Yeast hypo- 
thetical proteins YER078c and YFR006w and Mycobacterium tuberculosis hypothetical protein MtCY49.29c also be- 
long to this family. As a signature pattern for these enzymes a conserved region that contains three histidine residues 
has been developed 

[1221] Consensus pattern: [HA]-[GSYRHLIVMT]-ISG)-H-x-fLIV]-G-[LIVM]-x-[IV]-H-[DEJ- 
is [1222] [ 1] Yoshimoto T. ( Tone H. t Honda T, Osatomi K., Kobayashi R., Tsuru D. J. Biochem. 105:412-416(1989). 
[2] Nakahigashi K. t Inokuchi H. Nucleic Acids Res. 18:6439-6439(1 990). [ 3] Endo F., Tanoue A., Nakai H. ( Hata A., 
Indo Y, Titani K., Matsuda I. J. Biot. Chem. 264:4476-4481 (1989). [ 4] Rawlings N.D., Barrett A.J. Meth. Enzymol. 248: 
183-228(1995). 

[1223] Methionine aminopeptidase signatures. (2). Methionine aminopeptidase (EC 3.4.11.18 ) (MAP) is responsible 
20 for the removal of the amino-terminal (initiator) methionine from nascent eukaryotic cytosolic and cytoplasmic prokary- 
otic proteins if the penultimate amino acid is small and uncharged. All MAP studied to date are monomoric proteins 
that require cobalt ions for activity. Two subfamilies of MAP enzymes are known to exist [1 ,2]. While being evolutionary 
related, they only share a limited amount of sequence similarity mostly clustered around the residues shown, in the 
Escherichia coli MAP [3] ( to be involved in cobalt-binding. The first family consists of enzymes from prokaryotes as well 
2S as eukaryoticMAP-1, while the second group is made up of archebacterial MAP and eukaryoticMAP-2. The second 
subfamily also includes proteins which do not seem to be MAR but that are clearly evolutionary related such as mouse 
proliferation-associated protein 1 and fission yeast curved DNA-binding protein. For oach of thoso subfamilios, a spe- 
cific signature pattern that includes residues known to be involved in colbalt -binding has been developed. 
[1224] Consensus pattern: [MFY]-x-G-H-G-[LIVMC]-[GSH]-x(3)-H-x(4)-[LIVM]-x-[HN]-[YWV] [H is a cobalt ligand]- 
30 Consensus pattern: [DA]-[LIVMY]-x-K-[LIVM]-D-x-G-x-[HQ]-[LIVM]-[DNS]-G-x(3)-[DN] [The second D and the last D/ 
N are cobalt ligands] 

[1225] [ 1] Arfin S.M., Kendall R.L., Hall L, Weaver L.H., Stewart A.E., Matthews B.W., Bradshaw R.A. Proc. Natl. 
Acad. Sci. U.S.A. 92:7714-7718(1 995). [ 2] Keeling P.J., Doolittle W.F. Trends Biochem. Sci. 21 :285-286(1996).[ 3] 
Roderick S.L., Mathews B.W. Biochemistry 32:3907-391 2(1 993). [ 4] Rawlings N.D., Barrett A.J. Meth. Enzymol. 248: 
35 183-228(1995). 

[1226] 451. Cytochrome P450 cysteine heme-iron ligand signature 

Cytochrome P450's (1 ,2,3,.E1J are a group of enzymes involved in the oxidative metabolism of a high number of natural 
compounds (such as steroids, tatty acids, prostaglandins, leukotriones, etc) as well as drugs, carcinogens and muta- 
gens. Based on sequence similarities, P450's have been classified Into about forty dllleront families [4,5]. P450's are 
to proteins of 400 to 530 amino acids; the only exception is Bacillus BM-3 (CYP102) which is a protein of I048residues 
that contains a N-terminal P450 domain followed by a reductase domain. P450's are heme proteins. A conserved 
cysteine residue in the C-terminal part of P450's is involved in binding the heme iron in the fifth coordination site. From 
a region around this residue, a ten residue signature was developed specific to P450's. 

[1227] Consensus pattern: [FW]-(SGNH]-x-[GD]-x-[RHPT]-x-C-[LIVMFAP]-[GAD] [C is the heme iron ligand]- 

45 

[ 1] Nebert D.W., Gonzalez F.J. Annu. Rev. Biochem. 56:945-993(1987). 

[ 2] Coon M.J., Ding X. t Pernecky S.J., Vaz A.D.N. FASEB J. 6:669-673(1992). 

[ 3] Guengerich FP. J. Biol. Chem. 266:10019-10022(1991). 

[ 4] Nelson D.R., Kamataki T. ( Waxman D.J. t Guengerich F.P., Estrabrook R.W., Feyereisen R., Gonzalez F.J., 
so Coon M.J., Gunsalus I.C., Gotoh O., Okuda K., Nebert D.W. DNA Cell Biol. 12:1-51(1993). 

[ 5] Degtyarenko K.N., Archakov A.I. FEBS Lett. 332:1-8(1993). 

[1228] 452. (Pec Lyase) Pectate lyase 

This enzyme forms a right handed beta helix structure. Pectate lyase is an enzyme involved in the maceration and soft 
55 rotting of plant tissue. 

[1229] [1] Yodor MD, Keen NT. Jurnak F, Sclonco 1993;260:1503-1507. I 
[1230] 453. (pep M24) Aminopeptidase P and proline dipeptidase signature (pep1 ) 

Aminopeptidase P (EC 3.4.11.9 ) is the enzyme responsible for the release of any N-terminal amino acid adjacent to a 
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proline residue. Proline dipeptidase(EC 3.4.139 ) (prolidase) splits dipeptides with a prolyl residue ir .the carboxyl 
terminal position. Bacterial aminopeptidase P II (gene pepP) [1], proline dipeptidase (gene pepQ)[2], and human prolme 
diooptidase (gene PEPD) [3] are evolutionary related. These proteins ar manganese m talloenzymes. Y ast hypo- 
tholicnl p.oloino YER078c r.nd YFROOGw nnd Mycobacterium tuberculosis .hypothetical protein MtCY49.29c also be- 
s long to this lamily. As a signature pattern lor these enzymes a conserved region was selected that contains three 
histidine residues. 

[1231] Consensus pattern: [HA]-[GSYR]-[LIVMT]-[SG)-H-x-[LIV]-G-[LIVIv1)-x-[IV]-H-[DE]- 

| 1) YoshimoloT., Tone H., Honda T, Osatomi K., Kobayashi R.. Tsuru D. J. Biochem. 105:412-416(1989). 

io I V.\ N.'iknlikiriohl K., Inokuchi H. Nucloic Acids Ros. 18:6439-6439(1990). 

[ 3] Endo F., Tanoue A.. Nakal H., Hata A, Indo Y„ Titanl K.. Matsuda I. J. Biol. Chom. 264:4476-4481(1989). 
( 4] Rawlings N.D., Barrett A.J. Meth. Enzymol. 248:183-228(1995). 

[1232] Methionine aminopeptidase signatures (pep2) ■ \ m „ 

ib Methionine aminopeptidase (EC 3.4.1V18 ) (MAP) is responsible lor the removal of the ammo-terminal (initiator) me- 
thionine Horn nascent oukaryotic cytosolic and cytoplasmic prokaryotic proteins il the penultimate amino acid is small 
and uncharged All MAP studied to date are monomeric proteins that require cobalt ions for activity. Two subfamilies 
of MAP enzymes are known to exist [1,2]. While being evolutionary related, they only share a limited amount of se- 
quence similarity mostly clustered around the residues shown, in the Escherichia coli MAP [3].to be involved in cobalt- 
20 binding. The first family consists of enzymes from prokaryotes as well as eukaryotic MAP-1 while the second ( 3 .roup 
la m..cfo up ol mchobnctoriHl MAP «nd oukaryotic MAP-2. The second sublamiiy also includes proteins which do not 
seem to be MAP but that are clearly evolutionary related such as mouse proliferation-associated prote.n 1 and fission 
yeast curved DNA-binding protein. For each ol these subfamilies, a specific signature pattern was developed that 
includes residues known to be involved in colbalt-binding. „„„,. u ,. . ^ 

2S [1233] Consensus pattern: |MFY]-x-G-H-G-[UVMC].[GSH].x(3)-H-x(4HL.VM]-x-[H N]-[YWV] [H » a cobalt ligandh 
Consensus pattern: [DA]-|LIVMY]-x-K-|LIVM]-D-x-G-x-[HQHLIVM]-[DNS]-G-x(3)-[DN] [The second D and the last D/ 
N aro cobalt ligands] 

[ 1) Arfin S.M., Kendall R.L., Hall L, Weaver L.H., Stewart A.E.. Matthews B.W., Bradshaw R.A. Proc. Natl. Acad. 
30 Sci. U.S.A. 92:7714-7718(1995). 

[ 2] Keeling P.J., Doolittle W.F. Trends Biochem. Sci. 21:285-286(1996). 
[ 3] Roderick S.L., Mathews B.W. Biochemistry 32:3907-3912(1993). 
[ 4] Rawlings N.D., Barrett A.J. Meth. Enzymol. 248:183-228(1995). 

as [12341 454. Peroxidases signatures 

Perox dases (EC 111.1 .-) [1] are heme-binding enzymes that carry out a variely of biosynthetic and degradatrve func- 
tions using hydrogen peroxide as the electron acceptor. Peroxidases are widely distributed throughout bacteria, fung.. 
Sts and vertebrates. In peroxidases the heme prosthetic group is protoporphyrin IX and the fifth hgand of the heme 
iron is a histidine (known as the proximal histidine). Another histidine residue (the distal histidine) serves as an ac.d- 

40 base catalyst in the reaction between hydrogen peroxide and the enzyme. The regions around these two active site 
residues are more or less conserved in a majority of peroxidases [2,3]. The enzymes in which one or both o these 
regions can be found are lislod below. - Yeast cytochrome c peroxidase (EC VVLL5.)- - Myeloperoxidase (EC 1^7 
(MPO) MPO is found in granulocytes and monocytes and plays a major role in the oxygen<iependent microbic.da 
system of neutrophils. - Lactoperoxidase (EC 1.11.1.7 ) (LPO). LPO is a milk protein which acts as an antimicrobial 

as agent - Eosinophil peroxidase (EC 1.11.1.7 ) (EPO). An enzyme found in the cytoplasmic granules of eos.noph.ls. - 
Thyroid peroxidase fEC 1.11.1.8 ) (TPO). TPO plays a central role in the biosynthesis of thyroid hormones. It catalyzes 
the iodination and coupling of the hormonogenic tyrosines in thyroglobulin to yield the thyroid hormones T3 and T4. - 
Fungal ligninases. Ligninase catalyzes the first step in the degradation of lignin. It depolymer.zes hgnm by catalyzing 
the C(alpha)-C(beta) cleavage of the propyl side chains of lignin. - Plant peroxidases (EC L1L1Z)- P^s ^presses 

so f . |„ rfl o numbers of isozymes of peroxidases. Some ol thorn play a role in cell-suber,zat,on by catalyzing the depos, ion 
ol the aromatic residues ol suborin on the coll wall, some aro expressed as a defense response toward wounding, 
others are involved in the metabolism of auxin and the biosynthesis of lignin. - Prokaryotic catalase-perox.dases. Some 
bacterial species produce enzymes that exhibit both catalase and broad-spectrum peroxidase activities [4]. Examples 
of such enzymes are: catalase HP I from Escherichia coli (gene katG) and perA from Bacillus stearo ^; m °P h ' us h 

sb [1235] Consensus pattern: [DET]-|LIVMTA]-x(2)-[LIVM]-|LIVMSTAG)-|SAG]-[LIVMSTAG ]-H- [STA]-(LIVMFY] [H is 
tho proximal homo-binding ligand] - 

Consensus paUom: [SGATV]-x(3HLIVMAhR-[LIVMA]-x.[FW]-H-x.[SAC] [H is an act.v sit res.due]- 
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[ 1] Dawson J,H. Science 240:433-439(1 988). 

[2] Kimura S., Ikeda-Saito M. Proteins 3:113-120(1988). 

[ 3] Henrissat B., Saloheimo M., Lavaitte S., Knowles J.K.C. Proteins 8:251 -257(1990). 
[ 4] Welinder K G. Biochim. Biophys, Acta 1080:215-220(1991). 

[1236] 455. pfkB family of carbohydrate kinases signatures 

It has been shown [1,2,3] that the following carbohydrate and purine kinasesare evolutionary related and can be 
grouped into a single family, which isknown [1] as the 'pfkB family': - Fructokinase (EC 2.7.1.4 ) (gene scrK). - 6-phos- 
phofructokinase isozyme 2 (EC 2.7.1.11 ) (phosphofructokinase-2) (gene pfkB). pfkB is a minor phosphofructokinase 
isozyme in Escherichia coli and is not evolutionary related to the major isozyme (gone ptkA). Plants 6-phosphofruc- 
tokinase also belong to this family. - Ribokinase (EC 2.7.1.15 ) (gene rbsK). - Adenosine kinase (EC 2.7.1.20 ) (gene 
ADK). - 2-dehydro-3-deoxygluconokinase (EC 2.7. 1.45 ) (gene: kdgK). - 1 -phosphofructokinase (EC 2.7.1.56 ) (fructose 
1 -phosphate kinase) (gene fruK). - Inosine-guanosine kinase (EC 2.7.1,73 ) (gene gsk). - Tagatose-6-phosphate kinase 
(EC 2,7.1.144 ) (phosphotagatokinase) (gene lacC). - Escherichia coli hypothetical protein yeiC. - Escherichia coli hy- 
pothetical protein yeil. - Escherichia coli hypothetical protein yhfQ. - Escherichia coli hypothetical protein yihV - Bacillus 
subtilis hypothetical protein yxdC. - Yeast hypothetical protein YJR105w.AII the above kinases are proteins of from 280 
to 430 amino acid residues that share a few region of sequence similarity. Two of these regions were selected as 
signature patterns. The first pattern is based on a region rich in glycine which is located in the N-terminal section of 
these enzymes; while the second pattern is based on a conserved region in the C-terminal section. 
[1237] Consensus pattern: [AG]-G-x(0,1)-[GAP]-x-N-x-[STA]-x(6)-[GS]-x(9)-G- 
Consensus pattern: [DNSK]-[PSTV]-x-[SAG](2)-[GD]-D-x(3)-[SAGV]-[AG]-[LIVIv1FYA]-[LIVMSTAP] 

[ 1] Wu L-R, Reizer A., Reizer J. ( Cai B., Tomich J.M., Saier M.H. Jr. J. Bacteriol. 173:3117-3127(1991). 
[ 2] Orchard L.M.D., Kornberg H.L. Proc. R. Soc. Lond., B, Biol. ScL 242:87-90(1990). 
[ 3] Blatch G.L, Scholle R.R., Woods D.R. Gene 95:17-23(1990). 

[1238] 456. Phospholipase A2 active sites signatures 

Phospholipase A2 (EC 3.1.1.4 ) (PA2) [1 ,2] is an enzyme which releases fatty acids from the second carbon group of 
glycerol. PA2*s are small and rigid proteins of 120 amino-acid residues that have four to seven disulfide bonds. PA2 
binds a calcium ion which is required for activity. The side chains of two conserved rosiduos, a histldino and an aspartic 
acid, participate in a 'catalytic network'. Many PA2's have been sequenced from snakes, lizards, bees and mammals. 
In the latter, there are at least four forms: pancreatic, membrane-associated as well as two less characterized forms. 
The venom of most snakes contains multiple forms of PA2. Some of them are presynaptic neurotoxins which inhibit 
neuromuscular transmission by blocking acetylcholine release from the nerve termini. Two different signature patterns 
were derived for PA2's. The first is centered on the active site histidine and contains three cysteines involved in disulfide 
bonds. The second is centered on the active site aspartic acid and also contains three cysteines involved in disulfide 
bonds. 

[1239] Consensus pattern: C-C-x(2)-H-x(2)-C [H is tho activo sito residue] This pattern will not dotoct somo snako 
toxins homologous with PA2 but which have lost their catalytic activity as well as otoconin-22, a Xenopus protein from 
the aragonitic otoconia which is also unlikely to be enzymatically active. 

Consensus pattern: [LIVMA]-C-{LIVMFYWPCST}-C-D-x(5)-C [D is the active site residue] The majority of functional 
and non-functional PA2's. Undetected sequences are bee PA2, gila monster PA2's, PA2 PL-X from habu and PA2 PA- 
5 from mulga. 

[ 1] Davidson F.F., Dennis E.A. J. Mol. Evol. 31:228-238(1990). 

[ 2] Gomez R, Vandermeers A., Vandermeers-Piret M.-C, Herzog R., Rathe J., Stievenart M., Winand J., Chris- 
tophe J. Eur. J. Biochem. 186:23-33(1989). 

[1240] 457. Phosphorylase pyridoxal-phosphate attachment site. Phosphorylases (EC 2.4.1.1 ) [1] are important al- 
losteric enzymes in carbohydrate metabolism. They catalyze the formation of glucose 1-phosphatofrom polyglucoso 
such as glycogen, starch or maltodextrin. Enzymes from diflerent sources differ in their regulatory mechanisms and 
their natural substrates. However, all known phosphorylases share catalytic and structural properties. They are pyri- 
doxal-phosphate dependent enzymes; the pyridoxal-P group is attached to a lysine residue around which the sequence 
is highly conserved and can be used as a signature pattern to detect this class of enzymes. 
[1241] Consensus pattern: E-A-ISC]-G-x-[GS]-x-M-K-x(2)-[LM]-N [K is the pyrtdoxal-P attachment site]- 
[ 1] FukulT, Shlmomura 8., Nnknno K Mai, Coll, Biochem, 42:120-144(1 002), I 
[1242] 458. Protein kinases signatures and profile 

Eukaryotic protein kinases [1 to 5] are enzymes that belong to a very dxtenslv© family ol prdlelns which share a cofr 
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served catalytic core common toboth serine/threonine and tyrosine protein kinases. There are a number otconserved 
regions in the catalytic domain of protein kinases. Two of these regions were selected to build signature patterns. The 
first regidn, which is located in the N-terminal extremity of the catalytic domain, is a glycine-rich stretch of residues in' 
tho vicinity of a lysine rosiduo, which has boon shown to be involved in ATP binding. The second region, which is 
5 locatod in tho contra! part of tho catalytic domain, contains a conserved aspartic acid residue which is important for 
the catalytic activity of the enzyme [6]; Two signature patterns were derived for that region; one specific for serine/ 
threonine kinases and the other for tyrosine kinases. A profile was also developed which is based on the alignment in 
[1] and covers the entire catalytic domain. 

[1243] Consensus pattern: [LIV]-G-{Pj-G-{P}-[FYWMGSTNH]-[SGA]-(PW}-[LIVCAT]-{PD}-x- [GSTACLIVMFY]-x 
io (5 t i8)-[LIVMFYWCSTAR]-[AIVP]-|LIVMFAGCKR]-K [K binds ATP]. The majority of known protein kinases belong to 
the class detected by this pattern, but it fails to find a number of them, especially viral kinases which are quite divergent 
in this region and are completely missed by this pattern. 

Consensus pattern: [LIVMFYC]-x-|HY]-x-D-|LIVMFY]-K-x(2)-N-[LIVMFYCT](3) |D is an active site residue]. Most ser- 
ine/ threonine specific protein kinases belong to this class detected by the pattern with 10 exceptions (half of them viral 

T5 kinases) and also Epstein-Barr virus BGLF4 and Drosophila ninaC which have respectively Ser and Arg instead of the 
conserved Lys and which are therefore detected by the tyrosine kinase specific pattern described below 
[1244] Consensus pattern: [LIVMFYC]-x-[HY]-x-D-[LIVMFY]-[RSTAC]-x(2)-N-[LIVMFYC](3) [D is an active site res- 
idue) ALL tyrosine specific protein kinases with the exception of human ERBB3 and mouse blk belong to this class 
detected by the pattern. This pattern will also detect most bacterial aminoglycoside phosphotransferases [8,9] and 

20 herpesviruses gangciclovir kinases [10]; which are proteins structurally and evolutionary related to protein kinases. 
This profile also detects receptor guanylate cyclases and 2-5A-dependent ribonucleases. Sequence similarities be- 
tween these two families and the eukaryotic protein kinase family have been noticed before. It also detects Arabidopsis 
thaliana kinase- like protein TMKL1 which seems to have lost its catalytic activity. If a protein analyzed includes the 
two protein kinase signatures, the probability of it being a protein kinase is close to 100%. Eukaryotic -type protein 

25 kinases have also been found in prokaryotes such as Myxococcus xanthus [11] and Yersinia pseudotuberculosis. 

[ 1] Hanks S.K., Hunter T. FASEB J. 9:576-596(1995). 
| 2] Huntor T. Moth, Enzymol. 200:3-37(1991). 
[ 3] Hanks S.K., Quinn A.M. Meth. Enzymol. 200:38-62(1991). 
30 [4] Hanks S.K. Curr. Opin. Struct. Biol. 1:369-383(1991). 

[ 5] Hanks S.K., Quinn A.M., Hunter T Science 241:42-52(1988). 

[ 6] Knighton D.R., Zheng J., Ten Eyck L.F., Ashford V.A., Xuong N.-H, Taylor S.S., Sowadski J.M. Science 253: 
407-414(1991). 

[ 7] Bairoch A., Claverie J.-M. Nature 331 :22(1 988). 
35 [8]BennerS. Nature 329:21-21(1987). 

[ 9] Kirby R. J. Mol. Evol. 30:489-492(1 992). 

[10] Littler E., Stuart A.D., Chee M.S. Nature 358:160-162(1992). 

[11] Munoz-Dorado J., Inouye S., Inouye M. Cell 67:995-1006(1991) . 

40 [1245] Receptor tyrosine kinase class II signature 

A number of growth factors stimulate mitogenesis by interacting with a familyof cell surface receptors which possess 
an intrinsic, ligand-sensitive, protein tyrosine kinase activity [1]. These receptor tyrosine kinases (RTK)all share the 
same topology: an extracellular ligand-binding domain, a single transmembrane region and a cytoplasmic kinase do- 
main. However they can be classified into at least five groups. The prototype for class II RTK's is the insulin receptor, 

45 a heterotetramer of two alpha and two beta chains linked by disulfide bonds. The alpha and beta chains are cleavage 
products of a precursor molecule. The alpha chain contains the ligand binding site, the beta chain transverses the 
membrane and contains the tyrosine protein kinase domain. The receptors currently known to belong to class II are: 

- Insulin receptor from vertebrates. - Insulin growth factor I receptor from mammals. - Insulin receptor-related receptor 
(IRR), which is most probably a receptor for a peptide belonging to the insulin family. - Insects insulin-like receptors. - 

50 Molluscan insulin-related peptide(s) receptor (MIP-R). - Insulin-like peptide receptor from Branchiostoma lanceolatum. 

- The Drosophila developmental protein sevenless, a putative receptor for positional information required lor the for- 
mation of the R7 photoreceptor cells. - The trk family of receptors (NTRK1, NTRK2and NTRK3), which are high affinity 
receptors tor nerve growth factor and related neurotrophic factors (BDNF and NT-3). And the following uncharacterized 
receptors: - ROS. - LTK (TYK1). - EDDR1 (cak, TRKE, RTK6). - NTRK3 (Tyro10, TKT). - A sponge putative receptor 

55 tyrosine kinase. While only the insulin and the insulin growth factor I receptors are known to exist in the tetramenc 
conformation specific to class II RTK's, all the above proteins share extensiv homologies in their kinase domain, 
especially around the putative site of autophosphorylation. Hence, a signature pattern was developed for this class of 
RTK's, which includes the tyrosine residue, itself probably autophosphorylated. 
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[1246] Consensus pattern: [DN]-[LIV]-Y-x(3)-Y-Y-R [The second Y is the autophosphorylation site] 
[1247] [ 1] Yarden Y, Ullrich A. Annu. Rev. Biochem. 57:443-478(1988). 
[1248] Receptor tyrosine kinase class III signature 

A number of growth factors stimulate mitogenesis by interacting with a family of coll surface receptors which possess 
5 an intrinsic, ligand-sensitive, protein tyrosine kinase activity [1]. These receptor tyrosine kinases (RTK)all share the 
same topology: an extracellular ligand-bindtng domain, a single transmembrane region and a cytoplasmic kinase do- ^ 
main. However they can be classified into at least five groups. The class III RTK's are characterized by the presence 
of five to seven immunoglobulin-like domains [2] in their extracellular section. Their kinase domain differs from that of 
other RTK's by the insertion of a stretch of 70 to 100 hydrophilic residues in the middle ofthis domain. The receptors 
io currently known to belong to class III are: - Platelet-derived growth factor receptor (PDGF-R). PDGF-R exists as a 
homo- or heterodimer of two related chains: alpha and beta [3]. - Macrophage colony stimulating factor receptor (CSF- 
1-R) (also known as the fms oncogene). - Stem cell factor (mast cell growth factor) receptor (also known as the kit 
oncogene). - Vascular endothelial growth factor (VEGF) receptors Flt-1 and Flk-1/KDR [4]. - Fl cytokine receptor Flk- 
2/FU-3 [5]. - The putative receptor Flt-4 [7]. a signature pattern Was developed for this class of RTK's which is based 
*5 on a conserved region in the kinase domain. 

[1249] Consensus pattern: G-x-H-x-N-[LIVM)-V-N-L-L-G-A-C-T- 

[ 1] Yarden Y, Ullrich A. Annu. Rev. Biochem. 57:443-478(1988). 
[ 2] Hunkapiller T, Hood L. Adv. Immunol. 44:1-63(1989). 
20 [ 3] Lee K.-H., Bowen-Pope D.F., Reed R.R. Mol. Cell. Biol. 10:2237-2246(1990). 

[4] Terman B.I., Doughor-Vormazon M. ( Carrion M.E., Dimitrov D., Armollino D.C., Gospodarowicz D., Boohlon 
P. Biochem, Biophys. Res. Commun. 187:1579-1586(1992). 

[ 5] Lyman S.D.. James L, Vanden BosT, de Vries P., Brasel K., Gliniak B., Hollingsworth L.T., Picha K.S., McKenna 
H.J., Splett R.R. Cell 75:1157-1167(1993) . 
2S [6] Galland F., Karamysheva A., Pebusque M.J., Borg J.R, Rottapel R., Dubreuil P., Rosnet O., Birnbaum D. 

Oncogene 8 : 1 233- 1 240( 1993). 

[1250] Receptor tyrosine kinase class V signatures 

A number of growth factors stimulate mitogenesis by interacting with a familyof cell surface receptors which possess 
30 an intrinsic, ligand-sensitive, protein tyrosine kinase activity [1]. These receptor tyrosine kinases (RTK)all share the 
same topology: an extracellular ligand-binding domain, a single transmembrane region and a cytoplasmid kinase do- 
main. However they can be classified into at least five groups on the basis of sequence similarities. The extracellular 
domain of class V RTK's consist of a region of about 300amino acids, amongst which 16 conserved cyetoinos probably 
involved in disulfide bonds; this region is followed by two copies of a fibronectin typelll domain. The ligands for those 
35 receptors are proteins of about 200 to 300residues collectively known as Ephrins. The receptors currently known to 
belong to class V are [2.3.E1J: - EPHA1 (Eph-1; Esk). - EPHA2 (Eck; Mpk-5; Sek-2). - EPHA3 (Etk-1; Hek; Mek4; 
Tyro4; Rek4; Cek4). - EPHA4 (Sek; Hek8; Mpk-3; Cek8). - EPHA5 (Ehk-1; Hek7; Bsk; Cek7). - EPHA6 (Ehk-2). - 
EPHA7 (Ehk-3; Hekll; Mdk-1; Ebk). - EPHA8 (Eek). - EPHB1 (Eph-2; Elk; Net). - EPHB2 (Eph-3; Hek5; Drt; Erk; Nuk; 
Sek-3; Cek5; Qek5). - EPHB3 (Hek-2; Mdk-5). - EPHB4 (Htk; Mdk-2; Myk-1). - EPHB5 (Cek9).The EPHA subtype 
40 receptors bind to GPI-anchored ephrins while the EPHB subtype receptors bind to type-l membrane ephrins. Two 
signature patterns were developed for this class of RTK's, which each include some of Ihe conserved cysteine residues. 
[1251] Consensus pattern: F-x-[DN]-x-|GAW]-[GA]-C-[LIVM^^ 
[PSAW] [The two C's are probably involved in disulfide bonds] 

Consensus pattern: C-x(2)-[DE]-G-[DEQ]-W-x(2,3)-[PAQHLIVMT]-[GT]-x-C-x-C-x(2)-G-[HFY]-[EQ] [The three C'sare 
45 probably involved in disulfide bonds] 

[ 1] Yarden V, Ullrich A. Annu. Rev. Biochem. 57:443-478(1988), 

[ 2] Sajjadi F.G., Pasquale E.B., Subramani S. New Biol. 3:769-778(1991). 

[ 3] Wicks I. P., Wilkinson D., Sah/aris E., Boyd A.W. Proc. Natl. Acad. Sci. U.S.A. 89:1611-1615(1992). 

so 

[1252] 459. Protein kinase C terminal domain 
[1253] 460. Plant thionins signature 

Thionins are small, basic, plant proteins generally toxic to animal cells [1]They seem to exert their toxic effect at the 
level of the cell membrane but their exact function is not known. They consist of a polypeptide chain of forty five to fifty 
55 amino acids with three to four internal disulfide bonds. They are found in seeds but also in the cell wall of leaves [2]. 
Thionins tiro processed from larger precursor piotolnti Ciwmbln |4], o hycJipphubitJ plant t»«atl i-iiPtoin. nit** tjolAnija 
to this family. The pattern to detect this family of proteins includes three of the six cysteine residues involved in disulfide 
bonds. + +1+ +IIIII 
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xxCCxxxxxxxxxxxCxxxxxxxxxCxxxCxxCxxxxxCxxxxxxxx +'C: conserved cysteine involved in 

a disulfide bond.'*': position of the pattern. 

[1254] Consensus pattern: C-C-x(5)-R-x(2)-[FY]-x(2)-C [The three C*s are involved in disulfide bonds] The proteins 
Irorn Iho finmma-lhionin family nro not rolntod to tho abovo protoins and aro described in a separate section. 

. [ 1) Vernon LP., Even G.E., Zeikus R.D., Gray W.R. Arch. Biochem. Biophys. 238:18-29(1985). 

[ 2) Bohlmann H., Clausen S., Behnke S., Giese K, Hiller C, Reimann-Phillip U., Schrader G., Barkholt V., Apel 
K. EMBO J. 7: 1 559-1 565(1 988). 

| 3] Bohlmann H., Apel K. Mol. Gen. Genet. 207:446-454(1987). 
to | 4)"!ootoi M.M.. Mn/or J. A., L'llallon J.J, Biochomistry 20:5437-5443(1 981 ). 

[1255] 461. Polyprenyl synthetases signatures 

A variety of isoprenoid compounds are synthesized by various organisms. For example in eukaryotes the isoprenoid 
biosynthetic pathway is responsible for the synthesis of a variety of end products including cholesterol, dpiichol, ubiq- 

is uinone or coenzyme Q. In bacteria this pathway leads to the synthesis of isopentenyl tRNA, isoprenoid quinones, and 
Diifjnr cnnlm llpliln, Among Iho nn/ymnn that pnillclpnlo in lhal pathway nro a numbor of polyprenyl synthetase en- 
zymes which calalyzo a V4 -condensation botwoon 5 carbon isoprono units. Currontly tho soquonco of some ol those 
enzymes is known: - Eukaryotic famesyt pyrophosphate synthetase (FPP synthetase) (EC 2.5.1.1 / EC 2.5.1,10) which 
catalyzes tho soquonlial condensation of isopentenyl pyrophosphate (IPP) with dimethylallyl pyrophosphate (DMAPP), 

20 and then with the resultant geranyl pyrophosphate to form farnesyl pyrophosphate. FPP synthetase is a cytoplasmic 
dimeric enzyme. - Prokaryotic farnesyl pyrophosphate synthetase (gene ispA). - Prokaryotic octaprenyl diphosphate 
nynthar.o (anno ir.pB). - Prokaryotic hoptapronyl diphosphate synthase (EC 2.5. 1 .30 ). - Eukaryotic geranylgeranyl py- 
rophosphate synthetase (GGPP synthetase) (EC 2.5.1.1 / EC 2.5.1.10 / EC 2.5.1.29 ) which catalyzes 1he sequential 
addition of the three molecules of IPP onto DMAPP to form geranylgeranyl pyrophosphate. In plants GGPP synthase 

25 is a chloroplast enzyme involved in the biosynthesis of terpenoids; in fungi, such as Neurospora crassa (gene al-3), 
this enzyme is involved in the biosynthesis of carotenoids. - Prokaryotic GGPP synthetase, which are involved in the 
biosynthesis of carotenoids (gene crlE). Such an enzyme is also encoded in the cyanelle genome of Cyanophora 
paradoxa. - Eukaryotic hoxapronyl pyrophosphate synthotaso, which is involved in the biosynthesis of coenzyme Q 
and which catalyzes the formation of all trans- polyprenyl pyrophosphates generally ranging in length of between 6 

30 and 10 isoprene units depending on the species. HP synthetase is a mitochondrial membrane-associated enzyme. It 
has boon shown |1 to 5] that all tho above enzymes share some regions of sequence similarity. Two of these regions 
are rich in aspartic-acid residues and could be involved in the catalytic mechanism and/or the binding of the substrates, 
signature patterns were developed for both regions. Possible additional members of this family of proteins are: - Bacillus 
sublilis spore germination protein C3 (gene gerC3). Both proteins are most probably also enzymes involved in isopre- 

35 noid metabolism [6]. 

[1256] Consensus pattern: [LIVM](2)-x-D-D-x(2,4)-D-x(4)-R-R-[GH]- 
Consensus pattern: [LIVMFY]-G-x(2)-(FYL]-Q-[LIVM)-x-D-D-|LIVMFY]-x-[DNG] 

[ 1] Ashby M.N., Edwards P.A. J. Biol. Chem. 265:13157-13164(1990). 
40 [ 2] Fujisaki S., Hara H., Nishimura Y. t Horiuchi K., NishinoT. J. Biochem. 108:995-1000(1990). 

[ 3] Carattoli A. t Romano N., Ballario P., Morelli G., Macino G. J. Biol. Chem. 266:5854-5859(1991). 

[ 4J Kuntz M., Roemer S., Suire C. t Hugueney R, Weil J.H., Schantz R., Camara B. Plant J. 2:25-34(1992). 

[ 5] Math S.K., Hearst J.E., Poulter CD. Proc. Natl. Acad. Sci. U.S.A. 89:6761-6764(1992). 

[ 6J Bairoch A. Unpublished observations (1993). 

45 

[1257] 462. Potato inhibitor I family signature 

The potato inhibitor I family is one of the numerous families of serine proteinase inhibitors. Members of this protein 
family are found in plants; in the seeds of barley or beans [1 ,2,3], and in potato or tomato leaves where they accumulate . 
in response to mechanical damage [4,5]. An inhibitor belonging to this family is also found in leech [6]. It is interesting 
50 to note that, currently, this is the only proteinase inhibitor family to be found both inplant and animal kingdoms. Struc- 
turally thoso inhibitors aro small (60 to 90 residues) and in contrast with othor families of protease inhibitors, they lack 
disulfide bonds. They have a single inhibitory site. The consensus pattern includes three out of the tour residues con- 
served in all members of this family and is located in the N-terminal half. 

Consensus pattern: [FYW]-P-[EQH]-[LIV](2)-G-x(2)-[STAGV]-x(2)-A- Barley subtilisin-chymotrypsin inhibitor-2b has 
55 Glu instead of Gly. There is a trypsin inhibitor from the cucurbitaceae Momordica charanlia |7], which is said to belong 
to the potato inhibitor I family but which shows only a very weak similarity with the other members of this family. 

[ 1] Svendsen I., Hejgaard J., Chavan J.K. Carlsberg Res. Commun. 49:493-502(1984). 
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[ 2] Svendsen I., Boisen S., Hejgaard J. Carlsberg Res. Commun.47:45-53(1982). 
( 3] Nozawa H., Yamagata R, Aizono Y, Yoshikawa M., Iwasaki T. J. Biochem. 106:1003-1008(1989). 
[ 4] Cleveland T.E., Thornburg R.W., Ryan C.A. Plant Mol. Biol. 8:199-207(1987). ' 
[ 5] Lee J.S., Brown W.E., Graham J.S., Pearce G., Fox E.A., Dreher T.W., Ahem KG., Pearson G.D., Ryan C.A. 
5 Proc. Natl. Acad. Sci. U.S.A. 83:7277-7281(1986). 

[ 6] Seemuller U., Eu!i1z M. ( Fritz H., Strobl A. Hoppe-Seyler's Z, Physiol. Chem. 361:1841-1846(1980)'. 
[ 7] Zeng F.-Y., Qian R.-Q., Wang Y. FEBS Lett. 234:35-38(1988). 

[1258] 463. (pp binding) Phosphopantetheine attachment site 

to Phosphopantetheine (or pantetheine 4'phosphate) is the prosthetic group ol acyl carrier proteins (ACP) in some mul- 
tienzyme complexes where it serves as a 'swinging arm* for the attachment of activated fatty acid and amino-acid 
groups [1]. Phosphopantetheine is attached to a serine residue in these proteins [2]. ACP proteins .or domains have 
been found in various enzyme systems which are listed below (references are only provided for recently determined 
sequences). - Fatty acid synthetase (FAS), which catalyzes the formation of long-chain fatty acids from acetyl-CoA, 

is malonyl-CoA and NADPH. Bacteria! and plant chloroplast FAS are composed of eight separate subunits which corre- 
spond to the different enzymatic activities; ACP is one of these polypeptides. Fungal FAS consists of two multifunctional 
proteins, FAS1 and FAS2; the ACP domain is located in the N-terminal section of FAS2. Vertebrate FAS consists of a 
single multifunctional enzyme; the ACP domain is located between the beta-ketoacyl reductase domain and the C- 
terminal thioesterase domain [3]. - Polyketide antibiotics synthase enzyme systems. Polyketides are secondary me- 

20 tabolites produced from simple fatty acids, by microorganisms and plants. ACP is one of the polypeptide components 
involved in the biosynthesis of Streptomyces polyketide antibiotics actinorhodin, curamycin, granatacin, monensin, 
oxytetracycline and tetracenomycin C. - Bacillus subtilis putative polyketide synthases pksK, pksL and pksM which 
respectively contain three, five and one ACP domains. - The multifunctional 6-methysalicylic acid synthase (MSAS) 
Irom Penicillium patulum. This is a multifunctional enzyme involved in the biosynthesis of a polyketide antibiotic and 

2S which contains an ACP domain in the C-terminal extremity. - Multifunctional mycocerosic acid synthase (gene mas) 
from Mycobacterium bovis. - Gramicidin S synthetase I (gene grsA) from Bacillus brevis. This enzyme catalyzes the 
first step in the biosynthesis of the cyclic antibiotic gramicidin S. - Tyrocidine synthetase I (gene tycA) from Bacillus 
brevis. The reaction carried out by tycA is identical to that catalyzed by grsA - Gramicidin S synthetase II (gene grsB) 
from Bacillus brevis. This enzyme is a multifunctional protein that activates and polymerizes proline, valine, ornithine 

30 and leucine. GrsB contains four ACP domains. - Erythronolide synthase proteins 1, 2 and 3 from Saccharopolyspora 
erythraea which is involved in the biosynthesis of the polyketide antibiotic erythromicin. Each of these proteins contain 
two ACP domains. - Conidial green pigment synthase from Aspergillus nidulans. - ACV synthetase from various fungi. 
This enzyme catalyzes the first step in the biosynthesis of penicillin and cephalosporin. It contains three ACP domains. 
- Enterobactin synthetase component F (gene entF) f rom Escherichia coli. This enzyme is involved in the ATP-depend- 

35 ent activation of serine during enterobactin (enterochelin) biosynthesis. - Cyclic peptide antibiotic surfactin synthase 
subunits 1 , 2 and 3 from Bacillus subtilis. Subunits 1 and 2 contains three related domains while subunit 3 only contains 
a single domain. - HC-toxin synthetase (gene HTS1 ) from Cochliobolus carbonum. This enzyme synthesizes HC-tpxin, 
a cyclic tetrapeptide. HTS1 contains four ACP domains. - Fungal mitochondrial ACP [9], which is part of the respiratory 
chain NADH dehydrogenase (complex I). - Rhizobium nodulation protein nodF, which probably acts as an ACP in the 

40 synthesis of the nodulation Nod factor fatty acyl chain.The sequence around the phosphopantetheine attachment site 
is conserved in all these proteins and can be used as a signature pattern. A profile was also developed that spans the 
complete ACP-like domain. 

[12591 Consensus pattern: [DEQGSTALMKRH]-[LIVMFYSTAC]-[GNQ]-fLIVMFYAG]-[DNEKHS]-S- [LIVMST]-{PC- 
FY}-|STAGCPQLIVMF]-|LIVMATN]-[DENQGTAKRHLM)- |LIVMWSTA]-|LIVO!5TACR]-x(2)-|LIVMFA] [G lo Iho piinloth- 
eine attachment site] 

[ 1] Concise Encyclopedia Biochemistry, Second Edition, Walter de Gruyter, Berlin New- York (1988). 
[ 2] Pugh E.L., Wakil S.J. J. Biol. Chem. 240:4727-4733(1965). 

[ 3] Witkowski A., Rangan V.S., Randhawa Z.I., Amy CM., Smith S. Eur. J. Biochem. 198:571-579(1991). 
so [ 6] Scotti C., Piatti M., Cuzzoni A., Porani P., Tognoni A., Grand! G,, Gallzzl A., Albortlnl A.M. Gone 130:65-71 

(1993). 

[ 9] Sackmann U., Zensen R., Rohlen D., Jahhke U., Weiss H. Eur. J. Biochem. 200:463-469(1991). 

[1260] 464. (Prenyltrans) Terpene synthases signature 
55 The following onzymos catalyze mochcinisticrilly rolntod ronctions which invotvotho highly complox cyclic roftrmnrjo- 
mont ol equalono or Its 2,3 oxtdo: - Ltutoaloiol tiynllititio (EC 6,4.09.7 ) (uxldobqunluiitt^lttnubtoiul uyulribu), wliiuh 
catalyzes the cyclization of (S)-2 ( 3-epoxysqualene to lanostorol, the initial precursor of cholesterol, steroid hormones 
and vitamin D in vertebrates and of ergosterol in fungi (gene ERG7). - Cycloartenol synthase (EC 5.4.99.8 ) (2,3-epox- 
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ysqualene-cycloartenol cyclase), a plant enzyme that catalyzes the cyclization of (S)-2,3- epoxysqualene to cycloar- 
tendl - Hopene synthase (EC 5.4.99.-) (squalene-hopene cyclase), a bacterial enzyme that catalyzes the cycl.zat.on 
ot squalehe into hopene, a key step in hopanoid (triterpenoid) metabolism/These enzymes are evolutionary related [1] 
proteins ot about 70 to 85 Kd. As a signature pattern, a highly conserved region was selected which is rich in aromatic 
£ losiduos and which is localod in Iho C-lorminal 6oclion. 

M261] Consensus pattern: [DE]-G-S-W-x-G-x-W-[GA]-[LIVM]-x-|FY]-x-Y-[GA] 

[1262] [ 1] Corey E.J., Matsuda S.P.T., Bartel B. Proc. Natl. Acad. Sci. U.S.A. 90:11628-11632(1993). 
[1263] 465. Prion protein signatures 

Prion protein (PrP) [1,2.3] is a small glycoprotein found in high quantity in the brains of humans or animals infected 
<o with a number of degenerative neurological diseases such as Kuru, Creutzfeldt-Jacob disease (CJD). scrapie or bovine 
spongiform oncophalopathy (BSE). PrP is oncodod in the host gonomo and expressod both in normal and infected 
cells It has a tendency to aggregate yielding polymers called rods. Structurally, PrP is a protein cons.st.ng of a signal 
poplido followed by an N-torminal domain that contains tandem repeats of a short motif (PHGGGWGQin mammals, 
PHNPGY in chicken), itself followed by a highly conserved domain lly comes a C-terminal hydrophobic domain post- 
»5 translationally removed when PrP is attachedto the extracellular side of the cell membrane by a GPI-anchor. The 

otructuroof PrP is shown in the following schematic representation: +— + + 

' . ' " , . r - on , ^ I | — 1+ + + + GPI'C: conserved 

+ISiql Tandem repeats I C C Sll +---+ + 1 „ „ , _j 

cysteine involved in a disulfide bond. 1 *": position of the patterns. As signature pattern for PrP, a perfectly conserved 
alanine- and glycine-rich region of 16 residues was selected as well as a region centered on the second cyste.ne 

20 involved in the disulfide bond. 

[1264] Consensus pattern: A-G-A-A-A-A-G-A-V-V-G-G-L-G-G-Y- 

Consensus pattern: E-x-[ED]-x-K-[LIVM](2)-x-[KR]-[LIVM)(2).x-|QE]-M.C-x(2)- Q-Y [C is involved in a disulfide bond] 

[ 1] Stahl N., Prusiner S.B. FASEB J. 5:2799-2807(1991). 
25 [ 2] Brunori M., Chiara Silvestrini M., Pocchiari M. Trends Biochem. Sci. 13:309-313(1988). 

[ 3] Prusiner S.B. Annu. Rev. Microbiol. 43:345-374(1989). 

112661 466 Cyclophilin-lypo poptidyl-prolyl cis-trans isomorase signature and profile (pro isomerase) 

Cvclophilin [1] is the major high-affinity binding protein in vertebrates for the immunosuppressive drug cyclosporin A 

30 (CSA) It exhibits a peptidyl- prolyl cis-trans isomerase activity (EC 5.2.1. 8 ) (PPIase or rotamase). PPIase is an enzyme 
that accelerates protein folding by catalyzing the cis-transisomerization of proline imidic peptide bonds in oligopeptides 
|2l It is probable that CSA mediates some of its effects via an inhibitory action on PPIase. Cyclophilin is a cytosol.c 
protein which belongs to a family [3,4.5]that also includes the following isozymes: - Cyclophilin B (or S«ycloph,l,n), a 
PPIase which is retained in an endoplasmic reticulum compartment. - Cyclophilin C, a cytoplasmic PPiase. - Mitochon- 

35 drial matrix cyclophilin (cyp3). - A PPIase which seems specific for the folding ot rhodopsin and is an integral membrane 
protein anchored by a C-terminal transmembrane region. This protein was first characterized in Drosoph.la (gene 
ninaA) - Bacterial periplasms PPiase (gene ppiA). - Bacterial cytosolic PPiase (gene ppiB). - Natural-killer cell cyclo- 
philin-related protein. This large protein (about 1 60 Kd) is a component of a putative tumor-recognition complex involved 
in the function ot NK cells. It contains a cyclophilin-type PPiase domain. - Mammalian nucleoporin Nup358 [6], a nuc ear 

40 pore complex protein of 358 Kd that contains a C-terminal cyclophilin-type PPiase domain. - Yeast hypothetical protein 
YJR032W - Fission yeast hypothetical protein SpAC21E11.05c. - Caenorhabditis elegans hypotheHcal protein 
T27D1 .1 The sequences of the different forms of cyclophilin-type PPIases are well conserved. As a signature pattern, 
a conserved region was selected in the central part of these enzymes. CU . RP . C a 

[12661 Consensus pattern: [FY]-x(2)-[STCNLV]-x-F-H-[RH]-[LIVMN]-[LIVM]-x(2)-F- [LIVM]-x-Q-[AG]-G- FKBP s. a 

45 family of proteins that bind the immunosuppressive drug FK506. are also PPIases, but their sequence is not at all 
related to that of cyclophilin. 

[ 1] Stamnes M.A., Rutherlord S.L., Zuker C.S. Trends Cell Biol. 2:272-276(1992). 
[ 2] Fischer G., Schmid F X Biochemistry 29:2205-2212(1990). 
so [ 3] Trandinh C.C., Pao G.M.. Saier M.H. Jr. FASEB J. 6:3410-3420(1992). 

[ 4] Galat A. Eur. J. Biochem. 216:689-707(1993). 
[ 5 Hacker J , Fischer G. Mol. Microbiol. 10:445456(1993). 
- [ 6] Wu J., Matunis M.J., Kraemer D., Blobel G., Coutavas E. J. Biol. Chem. 270:14209-14213(1995). 

55 [1267] 467. Profilin signature ,•.„„,►,» 
Profilin [1 2] is a small eukaryolic protein that binds to monomeric actin(G-actin) in a 1:1 ratio thus preventing the 
polymerization of actin into filaments (F-actin). It can also, in certain circumstance promotes actin polym rization. 
Profilin also binds to polyphosphoinositides such as PIP2.0verall sequence similarity among profilin from organisms 
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which belong to different phyla (ranging from fungi to mammals) is low, but the N-terminal region is relatively well 
conserved. That region is thought to be involved inthe binding to actin. The signature pattern for prolilin is based on 
conserved residues at the N-terminal extremity .A protein structurally similar to profilin is present 'in the genom of 
variola and vaccinia viruses (gene A42R). 
5 [1268] Consensus pattern: <x(0,1)-(STA]-x(0,1)-W-[DENQH]-Xr[YI]-x-[DEQ] t 

[ 1] Haarer B.K., Brown S.S. Cell Motil. Cytoskeleton 17:71-74(1990). 
[ 2] Sohn R.H., Goldschmidt-Clermont R BioEssays 16:465-472(1994). 

70 [1269] 468. Protamine Pi signature 

Protamines are small, highly basic proteins, that substitute for histones in sperm chromatin during the haploid phase 
of spermatogenesis. They pack sperm DNA into a highly condensed, stable and inactive complex/ There are two 
different types of mammalian protamine, called P1 and P2. P1 has been found in all species studied, while P2 is 
sometimes absent. There seems to be a single type of avian protamine whose sequence is closely related to that of 

is mammalian P1 [1 ]. As a signature for this family of proteins, a conserved region was selected at the N-terminal extremity 
of the sequence. 

[1270] Consensus pattern: [AV]-R-[NFY]-R-x(2,3)-|ST]-x-S-x-S- 
[1271] [ 1] Oliva R„ Goren R., Dixon G.H. J. Biol. Chem. 264:17627-17630(1989). 
[1272] 469. Sperm histone P2 (protamine P2) 
20 This protein also known as protamine P2 can substitute for histones in the chromatin of sperm. The alignment contains 
both the sequence of the mature P2 protein and its propeptide. 
[1273] 470. Proteasome A-type subunits signature 

The proteasome (or macropain) (EC 3.4.99.46 ) [1 to 5.E1J is an eukaryotic and archaebacterial mult (catalytic proteinase 
complex that seems to be involved inan ATP/ubiquitin-dependent nonlysosomal proteolytic pathway. In eukaryotes the 

2S proteasome is composed of about 28 distinct subunits which form a highly ordered ring-shaped structure (20S ring) of 
about 700 Kd. Most proteasome subunits can be classified, on the basis on sequence similarities into two groups, A 
and B. Subunits that belong to the A-type group are proteins of from 210 to 290 amino acids that share a number of 
conserved sequence regions. Subunits that are known to belong to this family are listed below. - Vertebrate subunits 
C2 (nu), C3, C8, C9, iota andzeta. - Drosophila PROS-25, PROS-28.1, PROS-29and PROS-35. - Yeast C1 (PRS1), 

30 C5 (PRS3), C7 -alpha (Y8) (PRS2), Y7 ( Y1 3, PRE5, PRE6 and PUP2. - Arabidopsis thaliana subunits alpha and PSM30. 
- Thermoplasma acidophilum alpha-subunit. In this archaebacteria the proteasome is composed of only two different 
subunits. As a signature pattern for proteasome A-type subunits the best conserved region was selected, which is 
located in the N-terminal part of these proteins. 

[1274] Consensus pattern: [FY]-x(4)-[STNV]-x-[FYW)-S-P-x-G-[RKH]-x(2)-Q-[LIVM]-[DE]- Y-[SAD]-x(2)-[SAG]-. 
35 These proteins belong to family T1 in the classification of peptidases [6,E2]. 

[ 1] Rivett A.J. Biochem. J. 291:1-10(1993). 
[ 2] Rivett A.J. Arch. Biochem. Biophys. 268:1-8(1989). 
[ 3] Goldberg A.L., Rock K.L Nature 357:375-379(1992). 
40 [ A) Wilk S; Enzyme Protein 47:187-188(1993). 

[ 5] Hilt W., Wolf D.H. Trends Biochem. Sci. 21:96-102(1996). 
[ 6] Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:19-61(1994). 

[1275] Proteasome B-type subunits signature 

45 The proteasome (or macropain) (EC 3.4.99.46 ) [1 to 5.E1J is an eukaryotic and archaebacterial multicatalytic proteinase 
complex that seems to be involved in an ATP/ubiquitin-dependent nonlysosomal proteolytic pathway. In eukaryotes 
the proteasome is composed of about 28 distinct subunits which form a highly ordered ring-shaped structure (20S ring) 
of about 700 Kd. Most proteasome subunits can be classified, on the basis on sequence similarities into two groups, 
A and B. Subunits that belong to the B-type group are proteins of from 1 90 to 290 amino acids that share a number of 

50 conserved sequence regions. Subunits that are known to bolong to this family wo listod bolow. - Vortobmto subunits 
C5, beta, delta, epsilon, theta (C10-II), LMP2/RING12, C13 (LMP7/RING10), C7-I and MECL-1. - Yeast PRE1, PRE2 
(PRG1), PRE3, PRE4, PRS3, PUP1 and PUP3. - Drosophila L(3)73AI. - Fission yeast ptsl. - Thermoplasma acido- 
philum beta-subunit. In this archaebacteria the proteasome is composed of only two different subunits. As a signaturo 
pattern for proteasome B-type subunits the best conserved region was selected, which is located in the N-terminal part 

55 of thes proteins. 

[1276] Consensus pattern: ' tLIVMAj-[GSA]-[LIVMF]-x-[FYLVGAC]-x(2)-|GSACFY]'IUIVMSTAC](3)MQAC1- 
[GSTACVHDES]-x(15)-[RK]-x(12 t 13)-G-x(2)-[GSTA]-D-, These proteins belong to family T1 in the classification of 
peptidases [6.E21. 
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[ 1] RivettAJ. Biochem. J. 291:1-10(1993). 

[ 2] RivettAJ. Arch. Bicchem. Biophys. 268:1-8(1989). ... 
[ 3] GjoWberg A.L., Rock K.L Nature 357:375-379(1992). 

| A\ Wilk S. Enzymo Proloin 47:187-188(1993). . ( 

s 1 5) Hill W., Woll D.H. Trends Biochem. Sci. 21:96-102(1996). . , • . 

[ 6) Rawlings N,D., Barrett A.J. Meth. Enzymol. 244:19-61(1994). 

[1277] 471 (pyr redox) Pyridine nucleolide-disulphide oxidoreductases class-l active site 

Tho pyridine nucleolide-disulphide oxidoreductases are FAD flavop'roteins which contains a pair o! redoxractive 
w cysteines involved in the transtor ol reducing equivalents Irom the FAD cotaclor to the substrate. On the basis ot 
sequence and structural similarities |1] these enzymes can be classified into two categories. The first category groups 
together the following enzymes [2 to 6]: - Glutathione reductase (EC 1.6.4.2) (GR). - Higher eukaryotes thioredox.n 
loduclHGO (EC 1 6 4 5 ). • Trypanolhiono reductase (EC 1.6.4.8 ). - Lipoamide dehydrogenase (EC ±&AA), the E3 
component ol alphaTetoacid dehydrogenase complexes. - Mercuric reductase (EC Vie^.The sequence around 
i£ iho two cystoincs involvod in tho rodox-activo disulfide bond is conserved and can be used as a signature pattern. 
[12781 Consensus pattern: G-G-x-C-[LIVA]-x(2)-G-C-[LIVM]-P [The two C's form the active site disulfide bond]. In 
positions 6 and 7 ot the pattern all known sequences have Asn-(Val/ lie) with the exception of GRfrom plant chloroplasts 
and from cyanobacloria which have lle-Arg [7]. 

20 [ 1] Kurlyan J.. Krishna T.S.R., Wong L, Guenther B., Pahler A., Williams C.H. Jr., Model P. Nature 352:172-174 

(1991). 

[ 2] Rice D.W., Schulz G.E., Guest J R. J. Mol. Biol. 174:483-496(1984). 
[ 3] Brown N.L. Trends Biochem. Sci. 10:400-402(1985). 

| 4] CflfOlhoro D.J.. Pons G., Palol M.S. Arch. Biochom. Biophys. 268:409-425(1989). 
26 [5] Walsh C.T., Bradley M., Nadeau K. Trends Biochem. Sci. 16:305-309(1991). 

I 6] Gasdaska P.Y.. Gasdaska J.R., Cochran S., Powis G. FEBS Lett. 373:5-9(1995). 

[ 7] Creissen G., Edwards E.A., Enard C, Wellburn A.. Mullineaux P. Plant J. 2:129-131(1991). 

[1279] 472 (pyridoxal deC) DDC / GAD / HDC / TyrDC pyridoxal-phosphate attachment site (pyridoxal deC) 
30 Three different enzymes - all pyridoxal-dependent decarboxylases - seem to share regions of sequence similanty 
11 2 3 41 especially in the vicinity of the lysine residue which serves as the attachment site tor the pyridoxal-phosphate 
PLP)'group These enzymes are: - Glutamate decarboxylase (EC 4.1.1.15 ) (GAD). Catalyzes the decarboxylation of 
glutamate into the neurotransmitter GABA (4-aminobutanoate). - Histidine decarboxylase (EC 4. 1.1. 22 ) (HDC). Cata- 
lyzes the decarboxylation of histidine to histamine. There are two completely unrelated types of HDC: those that use 
as PLP as a cofactor (lound in Gram-negative bacteria and mammals), and those that contain a covalently bound pyruvoyl 
residue (found in Gram-positive bacteria). - Aromatic-L-amino-acid decarboxylase (EC 4.1.1.28) (DDC), also known 
as L-dopa decarboxylase or tryptophan decarboxylase. DDC catalyzes the decarboxylation of tryptophan totryplamme 
It also acts on 5-hydroxy-tryptophan and dihydroxyphenylalanine (L-dopa). - Tyrosine decarboxylase (EC 4.1.1.25 ) 
(TyrDC) which convorls tyrosine intolyramino, a procursorof isoquinolino alkaloids and various amides.These enzymes 
to are collectively known as group II decarboxylases [3,4], iiitipl)ll)n1 

[1280] Consensus pattern: S-[LIVMFYW]-x(5)-K-[LIVMFYWG](2)-x(3)-[LIVMFYW]-x-[CA]-x(2)-[LIVMFYWQ]-x(2)- 

[RK] [K is the pyridoxal-P attachment site] 

I 1] Jackson F.R. J. Mol. Evol. 31:325-329(1990). 
as [ 2] Joseph D.R.. Sullivan P., Wang Y.-M., Kozak C„ Fenstormacher D.A.. Behrendsen M.E., Zahnow C.A. Proc. 

Natl. Acad. Sci. U.S.A. 87:733-737(1990). 

[ 3] Sandmeier E . Hale T.I., Christen P. Eur. J. Biochem. 221:997-1002(1994). 

[ 4] Ishii S., Mizugichi H„ Nishino J., Hayashi H., Kagamiyama H. J. Biochem. 120:369-376(1996). 

bo [1281] 473. Ronulntor of chromosomo condensation (RCC1) signatures (RCC1) 

The regulator ol chromosome condensation (RCC1) [1] is a eukaryotic protein which binds to chromatin and interacts 
with ran a nuclear GTP-binding protein, to promote the loss of bound GDP and the uptake oflresh GTP. thus acting 
as a guanine-nucleotide dissociation stimulator (GDS)[2). The interaction of RCC1 with ran probably plays an important 
role in the regulation of gene expression. RCC1 , known as PRP20 or SRM1 in yeast, piml in fission yeast and BJ1 in 

ss Drosophila, is a protein that contains seven tandem repeats of a domain of about 50 to 60 ammo acids. As shown in 
tho following schematic representation, the repeats make up the major part of the length of the protein. Outside tn 
repeat region, there is just a small N-terminal domain of about 40 to 50 residues and, in the Drosophila protein only, a 
C-terminal domain of about 130 residues. + — + + + + + ~+ + + ~""" + < N .-' '"P*- 
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1 IRpt. 2 IRpt. 3 IRpt. 4 IRpt. 5 IRpt. 6 IRpt. 7 I C-terminal I +— -+ + + + + + + 

+ + In Drosophila two signature patterns for RCC1 were developed. The first is found in the N-terminal part 

of the second repeat; this is the most conserved part of RCC1 . The second is derived from conserved positions in the 
C-terminal part of each repeat and detects up to five copies of the repeated domain. The RCC1-type of repeat is also 
5 found in the X-linked retinitis pigmentosa GTPase regulator [3]. 
[1282] Consensus pattern: G-x-N-D-x(2)-{AV]-L-G-R-x-T- 

Consensus pattern: [LIVMFA)-(STAGC](2)-G-x(2)-H-|STAGLI]-[LIVMFA]-x-[LIVM)- 

| 1] Dasso M. Trends Biochom. Sci. 18:96-101(1993). 
io i 2] Boguski M.S., McCormick F. Nature 366:643-654(1993). 

[ 3] Roepman R., Van Duijnhoven G., Rosenberg T, Pinckers A.J.L.G., Bleeker-Wagemakers L.M., Bergen A. A. 
B., Post J., Beck A., Reinhardt R., Ropers H.-H., Cremers F., Berger W. Hum. Mol. Genet. 5:1035-1041(1996). 

[1283] 474. RNA 3'-terminal phosphate cyclase signature (RCT) 

is RNA 3'-lerminal phosphate cyclase (EC 6.5.1.4 ) (1 ,2] catalyzes the conversion of 3-phosphate to a 2\3'-cyclic phos- 
phodiester at the end of RNA. The biological role of this enzyme is unknown but it is likely to function in some aspects 
of cellular RNA processing. The reaction catalyzed by the enzyme occurs in three steps: 1) adenylation of the enzyme 
by ATP; 2) the enzyme acts on RNA-3'termina! phosphate to produce RNA-3'terminal diphosphate adenylate; 3) Re- 
lease of AMP and cyclisation by a non catalytic nucleophilic attack by the adjacent 2'hydroxyl on the phosphorus in 

20 the diester linkage. This enzyme, which has been characterized in human (whore thoro seems to bo at least throo 
isozymes) and Escherichia coli (gene rtCA), seems to be taxonomically widespread. It is found in insects, plants, fungi 
(gene RTC1 inyeast) and in archeabacteria. RNA cyclase is a protein of from 36 to 42 Kd. The best conserved region, 
which is used as a signature pattern, is a glycine-rich stretch of residues located in the central part of the sequence 
and which is reminiscent of various ATP, GTPor AMP glycine-rich loops. In this context, the conserved Arg (His in the 

25 E.coli enzyme) could be the AMP-binding residue. 

[1284] Consensus pattern: [RH]-G-x(2)-P-x-G(3)-x-[LIV> 

[ 1] Genschik P, Billy E., Swianiewicz M. ( Filipowicz W. EMBO J. 16:2955-2967(1997). 
[ 2] Filipowicz W., Vincente O. Meth. Enzymol. 181:499-510(1990). 
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[1285] 475. REV protein (anti-repression trans-activator protein) 

[1286] 476. Prokaryotic-type class I peptide chain release factors signature (RF-1) 

Peptide chain release factors (RFs) are required for the termination of protein biosynthesis [1]. At present two classes 
of RFs can be distinguished. Class I RFs bind to ribosomes that have encountered a stop codon at their decoding site 

35 and induce release of the nascent polypeptide. Class II RFs are GTP-binding proteins that interact with class I RFs 
and enhance class I RF activity. In prokaryotes there are two class I RFs that act in a codon specific manner[2]: RF-1 
(gene prf A) mediates UAA and UAG-dependent termination while RF-2(gene prIB) mediates UAA and UGA-dependent 
termination. RF-1 and RF-2 are structurally and evolutionary related proteins which have been shown [3] to make up 
a family that also contains the following proteins: - Fungal MRF1, a mitochondrial RF (m-RF) which recognizes the 

to UAA and UAG codons. - Escherichia coli RF-H, a protein of unknown function. ■ Escherichia coll hypothetical prololn 
yaeJ and a close Pseudomonas putida homolog. A highly conserved region located in the central part of the 40 to 45 
Kd RF-1/2 and m-RF and in the N-terminal of the 15 to 16Kd RF-H and yaeJ is used as a signature pattern. 
[1287] Consensus pattern: IAR]-[STA]-x-G-x-G-G-Q-(HNGCS]-V-N-x(3)-[ST]-A-[IV] 

Note that prokaryotic-type class I RFs display no significant sequence similarity to prokaryotic-type class II which belong 
45 to the family of GTP-binding elongation lactors nor to eukaryotic class I or class II RFs. 

( 11 Tate W.P, Poole E.S., Mannering S.M. Prog. Nucleic Acids. Res. Mol. Biol. 52:293-335(1996). 
| 2J Craigen W,J. ( Lee C.C., Caskoy C.T. Mol. Microbiol. 4:861-865(1990). 
| 3J Pel H.J., Rep M., Grivell LA. Nucleic Acids Res. 20:4423-4428(1992). 

50 

[1288] 477. RIQ1/ZK632.3/MJ0444 family signature 

The following uncharacterized proteins are evolutionary related [1]: - Yeast protein RIOI. - Caenorhabditis elegans 
hypothetical protein ZK632.3. - Methanococcus jannaschii hypothetical protein MJ0444. - Thermoplasma acidophilum 
hypothetical protein if rpoA2 3'region.The eukaryotic mombors of this family are proteins of about 55 to 60 Kd, whito 
55 the archebacterial ones are half that size. The central part of these proteins Is highly conserved. The best conserved 
region is used as a signature pattern. 1 
[1289] Consensus pattern: [LIVM]-V-H-[GA]-D-L-S-E-[FY]-N-x-[LIVM] 
[1290] [ 1] Bairoch A. Unpublished observations (1997). 
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[1 291] 478. (Rl P) Shiga/rtcin ribosomal inactivating toxins active site signature. A number of bacterial and plant toxins 
act by inhibiting protein synthesis in eukaryotic cells. The toxins of the Shiga and ricin family inactivate 60S ribosomal 
subunits by an N-glycosidic cleavage which releases a specific adenine base from the sugar-phosphate backbone of 
28S rRNA [1 ,2,3]. The toxins which are known to function in this manner are: - Shiga toxin from Shigella dysenteriae 

b [4]. This toxin is composod of ono copy of an enzymalicalty active A subunit and five copies of a B subunit responsible 
foe binding the toxin complex to specific receptors on the target cell surface. - Shiga-like toxins (SLT) are a group of 
Escherichia coti toxins very similar in their structure and properties to Shiga toxin. The sequence of two types of these 
toxins, SLT-1 [5] and SLT-2 [6], is known. - Ricin, a potent toxin from castor bean seeds. Ricin consists of two glyco- 
sylated chains linked by a disulfide bond. The A chain is enzymatically active. The B chain is a lectin with a binding 

w preference for galactosides. Both chains are encoded by a single polypeptidic precursor. Ricin is classified as a type- 
II ribosomo-inactivating protein (RIP); olhor mombers of this family aro agglutinin, also from castor bean, and abrin 
from the seeds of the bean Abrus precatorius [7], - Single chain ribosome-inactivating proteins (type-l RIP) from plants. 
Examplos of such proteins are: barley protein synthesis inhibitors I and II, mongolian snake-gourd trichosanthin, sponge 
gourd luflin-A and -B, garden tour-o'clock MAP, common pokeberry PAP-S and soapwort saporin-6 [7J.AII these toxins 

is are structurally related. A conserved glutamic residue has been implicated [8] in the catalytic mechanism; it is located 
i.nni /. ruimmvficl mijiImImo which nlno p»'»yo ioln in oMtnlyolo |9|. Tho Dirjnnluro thnt has boon dovolopod for those 
piotuinc includoa thoiio catalytic roalduos, 

[1292] Consensus pattern: [LIVMA)-x-[LIVMSTA](2)-x-E-[SAGV]-[STAL]-R-lFY]-[RKNQS]-x- [LIVM]-[EQS]-x(2)- 
[LIVMF] [E and R are active site residues]- 

20 [1293] I 1] Endo Y , Tsurugi K., Takeda Y., Ogasawara T, Igarashi K. Eur. J. Biochem. 171:45-50(1988).! 2} May M. 
J Hartley M.R., Roberts L.M., Krieg P.A., Osborn R.W., Lord J.M. EMBO J. 8:301 -308(1 989).[ 3] Funatsu G., Islam 
M R Mitwni Y Suncj-SII K., Kimura M. Biochimio 73:1157-11 61(1 991).[ 4] Strockbino N.A., Jackson M.R, Sung L 
M Holmes R K O'Brien A.D. J. Bacteriol. 170:1116-1122(1988).[ 5] Calderwood S.B., Auclair F., Donohue-Rolte A., 
Keusch G.T., Mekalanos J.J. Proc. Natl. Acad. Sci. U.S.A. 84:4364-4368(1 987).[ 6] Jackson M.R, Neill R.J., O'Brien 

25 AD Holmes R.K.. Newland J.W. FEMS Microbiol. Lett. 44:109-114(1 987).[ 7] Barbieri L, Battelli M.G., Stirpe F. Bio- 
chim Biophys Acta 1154:237-282(1 993).[ 8] Hovde C.J., Calderwood S.B., Mekalanos J. J., Collier R.J. Proc. Natl. 
Acrid. Sci. U.S.A. 85:2568-2572(1 988).[ 91 Monzingo A.F, Collins E.J., Ernst S.R., Irvin J.D., Robertas J.D. J. Mol. 
Diol. 233:705-715(1993). 

[1294] 479. Bacterial RNA polymerase, alpha chain (RNA pol A bac) 

30 Members of this family include alpha subunit from eubacteria and alpha subunits from chloroplasts. The alpha subunit 
of RNA polymerase consists of two independently folded domains, referred to as amino-terminal and carboxyl terminal 
domains. The amino terminal domain is involved in the interaction with the other subunits of the RNA polymerase. The 
carboxyl-terminal domain interacts with the DNA and activators. The amino acid sequence of the alpha subunit is 
conserved in prokaryotic and chloroplast RNA polymerases. There are three regions of particularly strong conservation, 

35 two in the amino-terminal and one in the carboxyl-Comment: terminal [3]. 

(1] Zhang G, Darst SA; Science 1 998; 281 :262-266. [2] Jeon YH, Negishi T, Shirakawa M, Yamazaki T, Fujita N, Ishihama 
A Kyogoku Y; Science 1995;270:1495-1497. [3] Ebright RH, Busby S; Curr Opin Genet Dev 1995;5:197-203. [4] Mu- 
rakami K, Kimura M, Owens JT, Meares CF, Ishihama A; Proc Natl Acad Sci USA 1997;94:1709-1714. 
[1295] 480. RNA polymerase beta subunit (RNA pol B) 

40 RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase 
compared to three in eukaryotes (not including mitochondrial and chloroplast polymerases). Each RNA polymerase 
complex contains two related members of this family, in each case they are the two largest subunits. [1] Falkenburg 
D, Dworniczak B, Faust DM, Bautz EK; J Mol Biol 1987;195:929-937. 
[1296] 481. RNA polymerases H / 23 Kd subunits signature 

45 m oukaryoles, there are three different forms of DNA-dependent RNA polymerases (EC 2.7.7.6) transcribing different 
sets of genes. Each class of RNA polymerase is an assemblage of ten to twelve different polypeptides. In archaebac- 
teria there is generally a single form of RNA polymerase which also consist of an oligomeric assemblage of 10 to 13 
polypeptides. Archaebacterial subunit H (gene rpoH) [1,2] is a small protein of about 8.5 tolO Kd, it is evolutionary 
related to the C-terminal part of a 23 Kd component shared by all three forms of eukaryotic RNA polymerases (gene 

so RPB5 in yeast and POLR2E in mammals).As a signature pattern a conserved region was selected which is located at 
theN-terminal extremity of subunit H; this region contains two histidines that could play a role in the binding of a metal ion. 
[1297] Consensus pattern: H-[NEI)-[LIVM]-V-P-x-H-x(2)-[LIVMJ-x(2)-[DE] 

[1] Klenk H.-R, Palm P., Lottspeich F., Zillig W. Proc. Natl. Acad. Sci, U.S.A. 89:407-410(1992). 
55 [ 2]thiruA., HodachM., Eloranta J.J., Kostourou V, Weinzierl R.O., Matthews S.; J. Mol. Biol. 287:753-760(1 999). 

[1298] 482. RNA polymerases K / 14 to 18 Kd subunits signature 

In eukaryotes, there are three different forms of DNA-dependent RN Apolymerases (EC 2.7.7.6) transcribing different 
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sets of genes. Each class of RNA polymerase is an assemblage of ten to twelve different polypeptides. In archaebac- 
teria, there is generally a single form of RNA polymerase'which also consist of an oligomeric assemblage of 10 to 13 
polypeptides. A component of 14 to 18 Kd shared by all three forms of eukaryotic RNA polymerases and which has 
boon soquoncod in budding yoast (gono RPB6 orRP026) ( in fission yoast (gono rpb6 or rpol 5), in human find in Africnn 
s swin fever virus [1] is volutionary related [2] to archaebacterial subunit K (gene rpoK): The archaebacterial protein 
is colihear with the C-terminal part of the eukaryotic subunit. 
[1299] Consensus pattern: [ST]-x-[FY]-E-x-[AT)-R-x-[LIVM]-[GSA]-x-R-[SA]-x-Q 

[ 1] Lu Z., Kutish G.F., Sussman M.D., Rock D.L. Nucleic Acids Res. 21:2940-2940(1993). 
io l 2] McKune K., Woychik N.A. J. Bacterid. 176:4754-4756(1994). 

[1 300] 483. RNA polymerases L / 1 3 to 1 6 Kd subunits signature 

In eukaryotes, there are three different forms of DNA-dependent RNApolymerases (EC 2.7.7,6) transcribing different 
sets of genes. Each class of RNA polymerase is an assemblage of ten to twelve different polypeptides. In archaebac- 

*5 teria, there is generally a single form of RNA polymerase which also consist of an oligomeric assemblage of 10 to 13 
polypeptides. It has been shown that small subunits of about 13 to 16 Kd found in all three typos of eukaryotic polymer- 
ases are highly conserved. Subunits known to belong to this family are: - Budding yeast RPC19 subunit from RNA 
polymerases I and III [1]. - Budding yeast RPB11 subunit from RNA polymerase II [2]. - Mammalian RPB11 (gono 
POLR2K) from RNA polymerase II. - Caenorhabditis elegans hypothetical protein F58A4.9. - Methanococcus jannaschii 

20 RNA polymerase subunit L (gene rpoL). - Sulfolobus acidocaldarius RNA polymerase subunit L (gene rpoL) [3]. As a 
signature pattern a conserved region was selected which is located at the N-terminal extremity of these polymerase 
subunits; this region contains two cysteines that could play a role in the binding of a metal ion. 
[1 301] Consensus pattern: |DE](2)-H-[ST]-[LI VM]-[GAP]-N-x(1 1 )-V-x-[FM]-x(2)-Y-x(3)- H-P 

25 [i] Dequard-Chablat M., Riva M., Carles C, Sentenac A. J. Biol. Chem. 266:15300-15307(1991). 

[ 2] Woychik N.A., McKune K., Lane W.S., Young R.A. Gene Expr. 3:77-82(1993). 
[ 3] Langer D. EMBL/GenBank: X70805. 

[1302] 484. RNA polymerases N / 8 Kd subunits signature 

30 in eukaryotes, there are three different forms of DNA-dependent RNA polymerases (EC 2.7.7.6 ) transcribing different 
sets of genes. Each class of RNA polymerase is an assemblage of ten to twelve different polypeptides. In archaebac- 
teria, there is generally a single form of RNA polymerase which also consist of an oligomeric assemblage of 10 to 13 
polypeptides. Archaobactorial subunit N (gono rpoN) (1) Is a small proloin of about 8 Kd, it Is evolutionary rolutod [2] 
to a 8.3 Kd component shared by all three forms of eukaryotic RNA polymerases (gene RPB10 in yeast and POLR2J 

35 jn mammals) as well as to African swine fever virus protein CP80R [3].As a signature pattern a conserved region was 
selected which is located at the N-terminal extremity of these polymerase subunits; this region contains two cysteines 
that could play a role in the binding of a metal ion. 
[1303] Consensus pattern: [LIVMF](2)-P-[LIVM]-x-C-F-[ST]-C-G- 

40 [ 1] Langer D , Hain J., Thuriaux P., Zillig W. Proc. Natl. Acad. Sci. U.S.A. 92:5768-5772(1995). 

[ 2] McKune K., Woychik N.A. J. Bacterrol: 176:4754-4756(1994). 

[ 3] Yanez R.J., Rodriguez J. M., Nogal M.L., Yuste L., Enriquez C, Rodriguez J. F., Vinuela E. Virology 208:249-278 
(1995). 

45 [1304] 485. Ribonuclease HII 

[1] Mian IS; Nucleic Acids Res 1997;25:3187-3189. 
[1305] 486. Ribonuclease PH signature 

Prokaryotic ribonuclease PH (EC 2.7.7.56 ) (RNase PH) [1 ] is a phosphorolyticexoribonuclease that removes nucleotide 
residues following the -CCA terminus of tRNA and adds nucleotides to the ends of RNA molecules by using nucleoside 
50 diphosphates as substrates. RNase PH is a conserved protein of about 240 amino-acid residues. It is evolutionary 
related to Caenorhabditis elegans hypothetical protein B0564.1.As a signature pattern, the most highly conserved 
region was selected which is located in the central part of these proteins. 

Consensus sequence: C-[DE]-[LIVM](2)-Q-[GTA]-D-G-[SG)-x(2)-[TA]-A [ 1] Kelly K.O., Deutscher M.P. J. Biol. Chem. 
267:17153-17158(1992). 
55 [1306] 487. RanBPI domain 

[1] Di MatteoG, Fuschl P, Zerfass K, Morettl S, Ricordy R, Conclarolll C, Trlpodl M, Jansen-Durr P, Lavla P; CellGrdwth 

Differ 1995;6:1213-1224, 

[1307] 488. Rhodanese signatures 
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Rhodanese (thiosulfate sulfurtransf erase) (EC 2.8.1.1 ) [1.2] is an enzyme which catalyzes th transfer of the sulfane 
atom of thiosulfate to cyanide, to form sulfite and thiocyanate. In vertebrates, rhodanese is a mitochondrial enzyme of 
about 300 amino-acid residues involved in forming iron-sulfur complexes and cyanide detoxification. A cysteine residue 
Inker, pnrl in Iho cnlnlylic mochnnicm. Sorno bnctorinl proteins closoly related to rhodanoso are also thought to express 

/. 1 1 GullotuincloffiGo ncllvlty. Thoco aro: - Azotobaclor vlnolandil rhdA. - Escherichia coli ssoA [3]. - Saccharopolyspora 
erythraea cysA [4]. - Synechococcus strain PCC 7942 rhdA [5]. RhdA is a periplasmic protein probably involved in th 
transport of sulfur compounds. Two patterns for the rhodanese family were developed. They are based on highly, 
conserved regions, one which is located in the N-terminal region, the other at the C-terminal extremity of the enzyme. 
[1308] Consensus pattern: [FY]-x{3)-H-[LIV]-P-G-A-x(2HLIVF] 

to Consonsus pattern: (FY]-[DEAF]-G-[SA]-W-x-E-[FYW] 

[ 1] Westley J. Meth. Enzymol. 77:285-291(1981). 
[ 2] Weiland K.L., Dooley T.P. Biochem. J. 275:227-231(1991). 
[ 3] Rudd K.E, Unpublished observations (1993). 
is [ 4] Donadio S., Shafiee A., Hutchinson C.R. J. Bacterid. 172:350-360(1990). 

| 5) Laudonbach D.E., Ehrhardt D., Green L, Grossman A.R. J. Bacteriol. 173:2751-2760(1991). 

[1309] 489. Ribonuclease III family signature 

Prokaryotic ribonuclease III (EC 3.1 .26.3 ) (gene rnc) [1] is an enzyme that digests double-stranded RNA. It is involved 
20 in the processing of ribosomal RNA precursors and of some mRNAs. RNase III is evolutionary related [2] to the following 
proteins- - Fission yeast pad, a ribonuclease that probably inhibits mating and meiosis by degrading a specific mRNA 
required lor sexual development. - Yeast ribonuclease III (gene RNT1), a dsRNA-specific nuclease that cleaves eu- 
kfiiyotic pioriboGomal RNA at various sitos. - Caonorhabditis ologans hypothetical protein F26E4.13. - Paramecium 
bursaria chlorella virus 1 protein A464R. - Synechocystis strain PCC 6803 hypothetical protein slr0346. - Fission yeast 
25 hypothetical protein SpAC8A4.08c, a protein with a N-terminal helicase domain and a C-terminal RNase III domain. - 
Caenorhabditis elegans hypothetical protein K12H4.8, a protein with the same structure as SpAC8A4.08c.These pro- 
teins share regions of sequence similarity; one of which is a highly conserved stretch of 9 residues which has been 
dovolopod as a signature pattern. 

[1310] Consensus pattern: [DEQ]-[RQ]-[LM]-E-[FYW]-[LV]-G-D-[SAR]- 
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[ 1J Nashimoto H., Uchida H. Mol. Gen, Genet. 201:25-29(1985). 
[ 2] Mian I.S. Nucleic Acids Res. 25:3187-3195(1997). 

[1311] 490. Rieske iron-sulfur protein signatures 

35 Ubiquinol-cytochrome c reductase (EC 1.10.2.2 ) (also known as the bcl complexor complex III) is one of the electron 
transport chains of mitochondria and o1 some aerobic prokaryotes; it catalyzes the ox idor eduction of ubiquinol and 
cytochrome c. In the chloroplast of plants and in cyanobacteria plastoquinone-plastocyanin reductase (EC 1.10.99.1 ) 
(also known as the b6f complex) is functionally similar and catalyzes the oxidoreduction of plastoquinol and cytochrome 
f. One of the components of these electron transfer systems is an iron-sulfur protein with a 2Fe-2S cluster, which is 

AO called the Rieske protein [1,2]. The Rieske protein contains approximately 190 amino acid residues. The iron-sulfur 
cluster is complexed to the protein through cysteine and histidine residues. Two perfectly conserved regions in Rieske 
proteins contains all the residuesthat bind the iron-sulfur cluster. Both regions contain two cysteines and a histidine. 
The first cysteine and the histidine are 2Fe-2S ligands while the remaining cysteines form a disulfide bond [3]. Two 
conserved regions were selected as signature patterns. 

45 [1312] Consensus pattern: C-[TK]-H-L-G-C-[LIVST] [The first C and the H are 2Fe-2S ligands] [The second C ts 
involved in a disulfide bond] 

Consensus pattern: C-P-C-H-x-[GSA] [The first C and the H are 2Fe-2S ligands] [The second C is involved in a disulfide 
bond] 

so [1] Gatti F.L., Meinhardt S.W., Ohnishi T., Tzagoloff A. J. Mol. Biol. 205:421-435(1989). 

I 2] Kallas T., Spiller S., Malkin R. Proc. Natl. Acad. Sci. U.S.A. 85:5794-5798(1988). 
[ 3] Iwata S. t Saynovits M., Link T.A., Michel H. Structure 4:567-579(1996). 

[131 3] 491 . Ribosomal protein L1 signature 
55 Ribosomal protein L1 is the largest protein from the large ribosomal subunit.ln Escherichia coli, L1 is known to bind to 
the 23S rRNA. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities [1, 2], groups: 
- Eubacterial L1. - Algal and plant chloroplast L1. - Cyanelle L1. - Archaebacterial L1. - V rtebrate L10A - Yeast 
SSMLAs a signature pattern, the best conserved region was selected located in the central section of these proteins. 
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It is located at the end of an alpha helix thought to be involved in RNA-binding. 

[1314] Consensus pattern: [IMI-x(2)-[LIVA]-x(2 f 3)-[LIVMhG-x(2)-[LMSHGSNHHPTKRhIKRA\^-G-x-IUMF]-P- 
[DENSTKQ] 

5 [1] Nikonov S.V. : Nevskaya N., Eliseikina I. A., Fomenkova N.P 1 , Nikulin A., Ossina N., Garber M., Jonsson B.-H. ( 

BriandC, Al-Karadaghi S., Svensson L.A., Aevarsson A., Liljas A. EMBO J. 15:1350-1359(1996). 
[ 21 Olvera J.. Wool I.G. 2.3,CO:2-"Biochem. Biophys. Res. Commun. 220:954-957(1996). 

[1315] 492. Ribosomal protein L10 signature 
io Ribosomal protein L10 is one ot the proteins Irorn the large ribosomal subunit. L10 is a protein of 162 to 185 amino- 
acid residues which has only been found so far in eubacteria. A conserved region located in the fsMerminal section of 
these proteins was used as a signature pattern. 

[1316] Consensus pattern: IDEH]-x(2)-[GS]-[LIVMF]-[STN]-[VA]-x-[DEQK]-[LIVMA]-x(2)-[LIM]-R 
[1317] 493. Ribosomal protein L10e signature 

is A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: - Vertebrate L10 (QM) [1J. - Plant L10. - Caenorhabditis elegans L10 (F10B5.1). - 
Yeast L10 (QSR1). - Methanococcus jannaschii MJ0543.These proteins have 174 to 232 amino-acid residues. A con- 
served region located in the central section was selected as a signature pattern. 
[1318] Consensus pattern: R-x-A-[FYW]-G-K-[PA]-x-G-x(2)-A-R-V 

20 [1] Chan Y.-L., Diaz J. -J., Denoroy L, Madjar J.-J., Wool I.G. 2.3.CO:2-*Biochem. Biophys. Res. Commun. 255: 
952-956(1996). 

[1319] 494. Ribosomal protein L11 signature 

[1320] Ribosomal protein L11 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L11 is 
known to bind directly to the 23S rRNA. It belongs to a family of ribosomal proteins which, on the basis of sequence 
25 similarities [1,2], groups: 

Eubaderial L11 .. 

Plant chlordplast L11 (nuclear-encoded). 
Read algal chloroplast L1 1 . 
30 - Cyanelle L11. 

Archaebacterial L11. 
Mammalian L1 2. 
Plants L1 2. 
- Yeast L1 2 (YL15). 

3S 

[1321] L11 is a protein of 140 to 165 amino-acid residues. A conserved region located in the C-terminal section of 
these proteins was selected as a signature pattern. In Escherichia coli, the C-terminal half of L11 has been shown [3] 
to be in an extended and loosely folded conformation and is likely to be buried within the ribosomal structure. 
[1322] Consensus pattern: [RKN)-x-[LIVM]-x-G-[ST]-x(2)-ISNQHLIVM]-G-x(2)-[LIVM]-x(0,1)-|DENG] 

40 

[ 1] Pucciarelli G., Remacha M., Ballesta J.P.G.; Nucleic Acids Res. 18:4409-4416(1990). 
[ 2] Otaka E., Hashimoto T., Mizuta K., Suzuki K.; Protein Seq. Data Anal. 5:301-313(1993). 
[ 3] Choli T. Biochem. Int. 19:1323-1338(1989). 

45 [1323] 495. Ribosomal protein L7/L12 C-terminal domain 

[1 324] [1 ] Leijonmarck M, Liljas A; J Mol Biol 1 987;1 95:555-579. 
[1325] 496. Ribosomal protein L13 signature 

Ribosomal protein L13 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L13 is known to be 
one of the early assembly proteins of the SOS ribosomal subunit. It belongs to a family of ribosomal proteins which, on 
50 the basis of sequence similarities [1], groups: - Eubacterial L13. 

Plant chloroplast L1 3 (nuclear-encoded). - Red algal chloroplast L1 3. 
Archaebacterial L13. - Mammalian Li 3a (Turn P198). - Yeast Rp22 and Rp23. 

55 [1326] L11 is a protein of 140 to 250 amino-acid residues. As a signature pattern, a conserved region was selected 
located In the C-terminal section ot these proteins. 1 
[1327] Consensus pattern: [LIVM]-[KRV]-[GK]-M-[LIV]-[PS]-x(4 t 5)-|GS]-[NQEKRA]-x(5)-[LIVM)-x-[AIV]-[LFY]-x- 

[GDN] 
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[1328] [ 1] Chan Y.-L, Olvera J., Glueck A., Wool I.G. J. Biol. Chem. 269:5589-5594(1994). 
[1329] 497, Ribosomal protein L13e signature 

A numbef of eukaryotic ribosomal proteins can be grouped on the basis of sequence similarities [1]. One of these 
fnmilioo condole of: 

i> 

Vertebrate L13 (was previously known as Breast Basic Conserved protein 1 (BBC1 )). - Drosophila L13. - Plant 
L13. - Yeast probable L13 (YM9375.11C). 

These proteins have 199 to 218 amino-acid residues. Asa signature pattern, a stretch of about 16 residues in the first 
10 third of those proteins selected. 

- Consensus pattern: [KR]-Y-x(2)-K-[LIVM]-R-(STA]-G-[KR]-G-F-[ST]-L-x-E 

[1330] [ 1] Olvera J. ( Wool I.G. Biochem. Biophys. Res. Commun. 201:102-107(1994). 
is [1331] 498. Ribosomal protein L1 4 signature 

RiboDomnl protoin L14 is one of tho proteins from tho large ribosomal subunit. In eubacteria, L14 is known to bind 
dlroctly to tho 23S iRNA. It bolongo to a family of ribosomal. proteins which, on tho basis of sequence similarities [1], 
groups: - Eubacterial L14. - Algal and plant chloroplasl L14. - Cyanelle L14. - Archaebacterial L14. - Yeast L17A. - 
Mammalian L23. 

20 

Caenorhabditis elegans L23 (B0336.10). - Higher eukaryotes mitochondrial L14. 
Yeast mitochondrial Yml38 (gene MRPL38). 

L14 is a protein of 11 9 to 1 37 amino-acid residues. As a signature pattern, a conserved region located in the C-terminal 
half of Ihoso prololns was solocted. 

- Consensus pattern: [GA]-[LIVE(3)-x(9,lO)-IDNS]-G-x(4)-[FY]-x(2)-[NT]-x(2)-V-[LIV] 

[1332] [ 1) Olaka E., Hashimoto T., Mizuta K., Suzuki K. Protein Seq. Data Anal. 5:301-313(1993). 
30 [1333] 499. Ribosomal protein L1 5 signature 

Ribosomal protoin L15 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L15 is known 1o bind 
the 23S rRNA. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities [1] t groups: - 
Eubacterial L15. - Plant chloroplast L15 (nuclear-encoded). 

35 - Archaebacterial L1 5. - Vertebrate L27a. - Tetrahymena thermophila L29. 

- Fungi L27a (L29, CRP-1 , CYH2). 

L15 is a protein of 144 to 154 amino-acid residues. As a signature pattern, a conserved region was selected in the C- 
termina! section of these proteins. 

40 

- Consensus pattern: K-[UVM](2)-[GASL]-x-[GT]-x-[UVMA 
A-x(3)-[LIVM]-x(3)-G 

[1334] [ 1) Otaka E., Hashimoto T., Mizuta K., Suzuki K. Protein Seq. Data Anal. 5:301-313(1993). 
45 [1336] 500. Ribosomal protein L15e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities 
[1]. One of these families consists of: 

- Mammalian L15. - Insect L15. - Plant L15. - Yeast YL10 (L13) (Rp15r). 
so - Thermoplasma acidophilum L15. 

These proteins have about 200 amino acid residues. As a signature pattern, a conserved region was selected located 
irv.the central section. 

55 . Consensus pattern: [DE]-[KR]-A-R-x-L-G-[FY]-x-[SAP]-x(2)-G-[LIVMFY](4)-R-x-R-[IV]-x-R-G 

[ 1] Zwickl P., Lupas A., Baumoistor W. 

Biochem. Biophys. Res. Commun. 209:684-688(1995). 
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[1336] 501. Ribosomal protein L17 signature 

Ribosomal protein L17 is one of the proteins from the large ribosomal subunit. L17 belongs to a family of ribosomal 
proteins which, on the basis of sequence similarities, groups: - Eubacterial L17. 

5 - Yeast mitochondrial Yml_8 (gene MRPL8). 

E ubacterial LI 7 is a protein of 1 20 to 1 30 amino-acid residues. Yeast Yml_8 is twice larger (238 residues), the sequence 
of its N-terminal half is colinear with that of eubacterial L1 7. As a signature pattern, a conserved region in the N-terminal 
section was selected. 

JO 

- Consensus pattern: l-x-[ST]-[GT]-x(2)-|KR]-x-K-x(6)-[DE]-x-[LIMV]-[.LIVMT]-T-x-[STAG]-[KR] 
[1337] 502. Ribosomal protein Ll8e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
'5 One of these families consists of: 

- Vertebrate L18 (known as L14 in Xenopus) [1]. - Plant L18. 
Yeast L18 (Rp28). - Halobacterium marismortui H129. 
Sulfolobus acidocaldarius H129e. 

20 

These proteins have 115 to 187 amino-acid residuos., A stretch of about 13 residues in tho first third of thoso protoins 
has been selected as a signature pattern. 

- Consensus pattern: [KRE]-x-L-x(2)-[PS]-[KR]-x(2)-[RH]-[PSA]-x-[LIVMHNS]-[LIVM]-x 

25 

[ 1] Puder M., Barnard G.F., Staniunas R.J., Steele G.D. Jr., Chen L.B. 
Biochim. Biophys. Acta 1216:134-136(1993). 
[1338] 503. Ribosomal L18p family 

It has been shown that the amino terminal 93 amino acids of Swiss:P09895 are necessary and sufficient to bind 5S 
30 rRNA in vitro. The carboxyl-terminal half of the protein, comprising amino acids 151 -296, serves to localize the protein 
to the nucleolus [1]. 
Number of members: 26 
[1] 

Medline: 9621 2235 

55 Distinct domains in ribosomal protein L5 mediate 5 S rRNA binding and nucleolar localization. 
Michael WM, Dreyfuss G; 
J Biol Chem 1996;271:11571-11574. 
[1339] 504. Ribosomal protein LI 9 signature 

Ribosomal protein L19 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L19 is known to be 
40 located at the 30S-50S ribosomal subunit interface and may play a role in the structure and function of the aminoacyl- 
tRNA binding site. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities, groups: - 
Eubacterial L19. 

Red algal chloroplast L19. - Cyanelle LI 9. 

45 

L19 is a protein of 120 to 130 amino-acid residues., 

A conserved region in the C-terminal section has been selected as a signature pattern. 

- Consensus pattern: [LIVM]-x-[KRGTI]-x-[GSAI]-[KRQDA]-[VG]-[RSN]-X(0,1 )-[KR]-[SA]-[KY]-[KLI]-[LYS]-Y-[LIM]- 
50 R 

[1340] 505. Ribosomal protein Ll9e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

55 

Mammalian rlboaomwl protoln LI 9 [1], ■ Drosophlla rlboeomal pioiojn L10 jgj, I 

Slime mold (D, discoideum) vegetative specific protein V14 [3), 

Yeast ribosomal protein L19 (YL14). - Archebacterlal ribosomal protein L19E. 
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[1341] These proteins have 148 to 203 amino-acid residues. 

A stretch ol about 20 residues in the N-terminal part of these proteins has been selected as a signature pattern, 
. Conoonoua pattern: 0-|KR]-R-|LIVM]-x-[SA]-x(4)-[CVJ-G.x(3)-[IVMWK]-[LlVF]-[DN]-P 

5 

[ X] Chan Y.-L, Lin A., McNally J., Peleg D., Meyuhas 0.,Wool I.G. 
J. Biol. Chom. 262:1111-1115(1987).! 2] Hart K. ( Klein T., Wilcox M 
Mech. Dev. 43:101 -110(1 993).( 3] Singleton C.K., Manning S.S., Ken R. 
Nucleic Acids Res. 17:9679-9*692(1 989). 
w 113421 606 Ribosomal protein Lie signature (Ribosomal_L4) 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis ol sequence similarities. 
Ono of these families consists [1,2,3, 4] of: - Vertebrate L1 (L4). - Drosophila L1. - Plant L1. - Yeast L2 (Rp2). 

Fission yeast L2. - Hatobacterium marismortui HmaL4 (HL6). 
is . Methanococcus jannaschii MJ0177. 

These proteins have 246 (archaebacteria) to 427 (human) amino acids. A conserved region in the N-terminal part ot 
these proteins has been selected as a signature pattern. 


20 


55 


Consensus pattern: N-x(3)-[KRM)-x(2)-A-[LIVT]-x-S-A-[LIV]-x-A-[ST]-ISGA]-x(7)-[RK].[GS]-H 


[ 1] Ralli F., Gargiulo G. ( Manzi A., Malva C, Graziani F. 
Nucleic Acids Res. 17:456-456(1 989).[ 2] Presutti C, Villa T. ( Bozzoni I. 
Nucleic Acids Res. 21 : 3900-3900(1 993). 
25 l 3] Bagni C Mariottini P, Annesi F., Amaldi F. 

Biochim. Biophys. Acta 1216:475-478(1993). 

| 3] Aindl E. t Kroomor W., Halnkoyama T J. Biol. Chom. 265:3034-3039(1990). 

[13431 507. Ribosomal protein L2 signature , u ■ ^ 

30 Ribosomal protein L2 is one of the proteins from the large ribosomal subunit. In Escherichia col., L2 is known to bind 
to the 23S rRNA and to have peptidyltransferase activity. It belongs to a family of ribosomal proteins which, on the 
basis ot sequence similarities (1 ,2], groups: - Eubacterial L2. 

Algal and plant chloroplast L2. - Cyanelle L2. - Archaebacterial L2. 
35 . Plant L2. - Slime mold L2. - Marchantia polymorpha mitochondrial L2. 

- Paramecium tetraurelia mitochondrial L2. - Fission yeast K5, K37 and KD4. 
Yeast YL6. - Vertebrate L8. 

The best conserved region located in the C-terminal section of these proteins has been selected as 
40 a signature pattern. 

- Consensus pattern: P-x(2)-R-G-[STAIV](2)-x-N-[APK]-x-[DE] 

I 1] Marty I., Meyer Y. 
45 Nucleic Acids Res. 20:1517-1522(1992). 

[ 2] Otaka E., Hashimoto T, Mizuta K„ Suzuki K. 
Protein Seq. Data Anal. 5:301-313(1993). 

[13441 508. Ribosomal protein L20 signature *~w^r* 
so Ribosomal protein L20 is one of the proteins from the large ribosomal subunit. In Escherichia col,, L20 is known to b nd 
directly to the 23S rRNA. It belongs to a family ot ribosomal proteins which, on the basis of sequence similarities 11 J, 
groups: - Eubacterial L20. - Algal and plant chloroplast L20. 


Cyanelle L20. 

L20 is a protein of about 120 amino-acid residues. A conserved region located in the central section of these proteins 
has been selected as a signature pattern. 
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- Consensus pattern: K-x(3)-[KRC]-x-[LIVM]-W-[IV]-[STNALV]-R-[LIVMHNS]-x(3)-[RKHS] 

[ 1] Otaka E., Hashimoto T, Mizuta K., Suzuki K. 
Protein Seq. Data Anal. 5:301-313(1993). 
5 [1345] 509. Ribosomal protein L21e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

Mammalian L21 [1]. - Entamoeba histolytica L21 [2]. 
to - Caenorhabditis etegans L21 (C14B9.7). - Yeast L21E (URP1) [3]. ' ' 

Halobacterium marismortui HL31 [4]. 

These proteins have 160 (eukaryotes) or 95 (archebacteria) amino-acid residues. A conserved region in the central 
part of these proteins has been selected as a signature pattern. 

is 

- Consensus pattern: G-[DE]-x-V-x(10)-[GV]-x(2)-[FYH]-x(2)-[FY]-x-G-x-T-G 

[ 1] Devi K.R.G., Chan Y.-L. Wool I.G. 
Biochem. Biophys. Res. Commun. 162:364-370(1989). 
^0 [ 2] Petter R., Rozenblatt S., Nuchamowitz Y., Mirelman D. 

Mol. Biochom. Parasitol. 56:329-333(1992). 

[ 3] Jank B., Waldherr M., Schweyen R.J. Curr. Genet. 23:15-18(1993). 
[ 4] Hatakeyama T., Kimura M. Eur. J. Biochem. 172:703-711(1988). 

2S [1346] 510. Ribosomal protein L21 signature 

Ribosomal protein L21 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L21 is known to bind 
to the 23S rRNA in the presence of L20. It belongs to a family of ribosomal proteins which, on the bnsis of soquonco 
similarities, groups: - Eubacterial L21. 

30 - Marchantia polymorpha chioroplast L21. - Cyanelle L21. 
Spinach chloroplast L21 (nuclear-encoded). 

Eubacterial L21 is a protein of about 100 amino-acid residues, the mature form of the spinach chloroplaBt L21 has 200 
r sidues. A conserved region located in the C-terminal section of these proteins has been selected as a signature 
35 pattern. 

Consensus pattern: [IVT]-x(3)-[KR]-x(3)-[KRQ]-K-x(6)-G-[HF]-R-[RQ]-x(2)-[ST] 

[1347] 511. Ribosomal protein L22 signature 
40 Ribosomal protein L22 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L22 is known to bind 
23S rRNA. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities [1 ,2,3], groups: - 
Eubacterial L22. 

Algal and plant chloroplast L22 (in legumes L22 is encoded in the nucleus instead of the chloroplast).. - Cyanelle 
45 L22. - Archaebacterial L22. 

- Mammalian L17. - Plant L17. - Yeast YL17. 

A conserved region located in the C- terminal section of these proteins has been selected as a signature pattern. 
so - Consensus pattern: [RKQN]-x(4)-[RH]-[GAS]-x-G-[KRQS]-x(9)-[HDN]-(LIVM]-x-[LIVMS]-x-[LIVM] 
[ 1] Gantt J.S., Baldauf S.L, Calie P.J., Weeden N.R, Palmer J.D. 

EMBO J. 10:3073-3078(1 991 ).[ 2] Madsen L.H., Kreiberg J.D., Gausing K. Curr. Genet. 19:417-422(1991). 
[ 3] Otaka E., Hashimoto T., Mizuta K. ( Suzuki K. 
ss Protein Seq, Data Anal. 5:301-313(1993). 

t 

[1348] 512. Ribosomal protein L23 signature 

Ribosomal protein L23 Is one of the proteins from the largd t ibosomsl subunit. In Escherichia to\\, L23 Is known lb bind 
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a specific region on the 23S rRNA; in yeast, the corresponding protein binds to a homologous site on the 26S rRNA 
|1 ]. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities [2,3,4], groups: - Eubacterial 

L23. | 

6 - Algal und plant chloroplast L23, - Afchaobaclorlal L23. - Mammalian L23A. . 

- Caenorhabditis elegans L23A (F55D10.2). - Fungi L25, 

- " Yeast mitochondrial YmL41 (gene MRPL41 or MRP20). 

[1349] A small conserved region in the C-terminal section of these proteins, which is probably involved in rRNA- 
io binding has boon soloctod as a signature pattorn |2]. 

- Consensus pattern: [RK](2)-[AM]-[IVFYT]-[IV]-[RKT]-L-[STANEQK]-x(7HLIVMFT] 

[ 1] El Baradi T.T.A.L, Raue H.A., van de Regt C.H.F., Verbree E:C, 
16 Planta R.J. EMBO J. 4:210-2107(1985). 

[ 2| Hauu H.A., Ot/ikrt C, Guzukl K, J, Mol. Gvol. 20:410-/120(1 000). 

I 3J Fearon K.. Mason T.L J. Biol. Chem. 267:5162-5170(1992). 1 
[ 4] Otaka E., Hashimoto T., Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

20 

[1350] 513. Ribosomal protein L24 signature 

Ribosomal protoin L24 is ono of tho proteins Irom the largo ribosomal subunit. L24 belongs to a family of ribosomal 
proteins which, on the basis ol sequence similarities, groups: - Eubacterial L24. 

25 - Plant chloroplast L24 (nuclear-encoded). - Red algal L24. - Vertebrate L26. 
Yeast L26 (YL33). - Archaebacterial HmaL24 (HL15). 
A probable ribosomal protein from Sulfolobus acidocaldarius [1]. 

In their mature form, these proteins have 103 to 150 amino-acid residues. 
30 A conserved stretch of 20 residues in their N-terminal section has been selected as a signature pattern. 

- Consensus pattern: [GDEN]-D-x-V-x-[l^-[LIVMA]-x-G-x(2)-[KRAHGNQ]-x(2,3)-[GA]-x-[IV] 

[ 1] Ouzounis C.„ Kyrpides N., Sander C. 
35 Nucleic Acids Res. 23:565-570(1 995). 

[1351] 514. Ribosomal protein L24e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists [1] of: 

40 - Mammalian ribosomal protein L24. 

- Yeast ribosomal protein L30A/B (Rp29) (YL21 ). 
Kluyveromyces lactis ribosomal protein L30. 
Arabidopsis thaliana ribosomal protein L24 homolog. 
Haloarcula marismortui ribosomal protein HL21/HL22. 

45 - Methanococcus jannaschii MJ1201 . 

These proteins have 60 to 160 amino-acid residues. The most conserved region, which is located in the N-terminal 
region of these proteins has been selected as a signature pattern. 

so - Consensus pattern: [FY)-x-[GSH]-x(2)-[IV]-x-P-G-x-G-x(2)-[FYV]-x-[KRHE]-x-D 

[ 1] Chan Y.-L, Olvera J.. Wool I.G. Biochem. Biophys. Res. Commun. 202:1176-1180(1994). 
[1352] 515. Ribosomal protein L27 signature 

Ribosomal protein L27 is one of the proteins from the large ribosomal subunit. L27 belongs to a family of ribosomal 
55 proteins which, on the basis of sequence similarities [1,2], groups: - Eubacterial L27. 

Plant chloroplast L27 (nuclear-encoded). - Algal chloroplast L27. 

- Yeast mitochondrial YmL2 (gene MRPL2 or MRP7). 
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The schematic relationship between these groups of proteins is shown below. Eub. L27 Nxxxxxxxxx Algal L27 
Nxxxxxxxxx 

Plant L27 tttUNxxxxxxxxxxxxx 

Yeast MRP7 tttNxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 
5 ***T: transit peptide. 

'N': N-terminal of mature protein.'*': position of the pattern, 

- Consensus pattern: G-x-[LIVM](2)-x-R-Q-R-G-x(5)-G 

io [ 1] Elhag G.A., Bourque D.P. Biochemistry 31:6856-6864(1992). 

[ 2] Otaka E., Hashimoto T., Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

[1353] 516. Ribosomal L28 family 

*5 The ribosomal 28 family includes L28 proteins from bacteria, and chloroplasts. The L24 protein from yeast Swiss: 
P36525 also contains a region of similarity to prokaryotic L28 proteins. L24 from yeast is also found in the large ribos- 
omal subunit 
Number of members: 24 
[1354] 517. Ribosomal protein L29 signature 

20 Ribosomal protein L29 is one of the proteins from the large ribosomal subunit. L29 belongs to a family of ribosomal 
proteins which, on the basis of sequence similarities [1], groups: - Eubactorial L29. - Red algal L29. 

Archaebacterial L29. - Mammalian L35 - Caenorhabditis elegans L35 (ZK652.4). 

- Yeast L35. 

25 

L29 is a protein of 63 to 138 amino-acid residues. 

A conserved region located in the central section of L29 has been selected as a signature pattern. 

- Consensus pattern: [KNQS]-[PSTL]-x(2)-[LIMFA]-[KRGSAN]-x-[LIVYSTA]-[KR]-[KRHQS]-[DESTANRL)-[LIV]-A- 
30 [KRCQVTHLIVMA] 

[ 1] Otaka E., Hashimoto T, Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 
[1355] 518. Ribosomal protein L3 signature 
35 Ribosomal protein L3 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L3 is known to bind 
to the 23S rRNA and may participate in the formation of the peptidyltransferase center of the ribosome. It belongs to 
a family of ribosomal proteins which, on the basis of sequence similarities [1,2,3,4], groups: - Eubacterial L3. - Red 
algal L3. - Cyanelle L3. 

40 - Archaebacterial Halobacterium marismortui HmaL3 (HL1). 

Yeast L3 (also known as trichodermin resistance protein) (gene TCM1 ). 
Arabidopsis thaliana L3 (genes ARP1 and ARP2). - Mammalian L3 (L4). 

Mammalian mitochondrial L3. - Yeast mitochondrial Yml_9 (gene MRPL9). A conserved region located in the central 
soctlon of thoso protolns hoe boon ooloctod no n olgnnluro pntlorn, 
-*5 - Consensus pattern: [FL]-x(6)-lDN]-x(2)-|AGS]-x-[ST]-x-G-[KRH]-G-x(2)-G-x(3)-R 

[ 1] Arndt E., Kroemer W., Hatakeyama T. J. Biol. Chem. 265:3034-3039(1990). 
[ 2] Graack H.-R., Grohmann L, Kitakawa M., Schaefer K.L., Kruft V. 
Eur. J. Biochem. 206:373-380(1 992). 
so [ 3] Herwig S., Kruft V, Wittmann-Liebold B. 

Eur. J. Biochem. 207:877-885(1992). 
[ 4] Otaka E., Hashimoto T, Mizuta K., Suzuki K. 
Protein Seq. Data Anal. 5:301-313(1993). 

55 [1356] 519. Ribosomal protein L30 signature 

Ribosomal protoln L30 Is one of the proteins -from- the laige ribosomal eubunil, L30 belongs to a family ol riuobdmal 
proteins which, on the basis of sequence similarities [1], groups: - Eubacterial L30. - Archaebacterial L30. 
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Drosophila L7. - Slime mold L7. - Mammalian L7. - Fungi L7 (YL8). 

- 'Yeast mitochondrial L33. 

L30 from'oubacteria are small proteins of about 60 residues, those from archaebacteria are proteins of about 150 
c rociduos. Eukaryolic L7 aro proteins of about 250 to 270 residues. The schematic relationship betwe n the three groups 
of-proteins is shown below.Eub. L30 NxxxxxxxxxxC 
Arc. L30 NxxxxxxxxxxxxxxxxxxxxxxxxxxxC 

Euk L7 NxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxC position of the pattern. 

The signature pattern for this family of ribosomal proteins spans the N-terminal half of the region common to all these 
10 proteins. 

- Consensus pattern: [i\n>[LIVM]-x(2)-[LF]-x^ 
VA]-x(2)-[LMFY]-|IVT) 

is [1] Mizuta K., Hashimoto T, Otaka E. 

Nucloic Acids Ros. 20:1011-1016(1992). 
[1357] 520. Ribosomal protein L31 signature 

Ribosomal protein L31 is one of the proteins from the large ribosomal subunit. L31 is a protein of 66 to 97 amino-acid 
residues which has only been found so far in eubacteria and in some algal chloroplasts. 
20 A conserved region located in the central section of these proteins has been selected as a signature pattern. 

- Consensus pattern- H-P-F-[FY]-[TI]-x(9)-G-R-IAIV]-x-[KRQ] 

[1 358] 521 . Ribosomal protein L31 e signature 
25 A number of eukaryotic and archaebacterial ribosomal proteins can be grouped oh the basis of sequence similarities. 
One of these families consists of: 

Mammalian L31 |1]. - Chlamydomonas reinhardtii L31 . - Yeast L34. 
Halobacterium marismortui HL30 [2]. 

30 

These proteins have 87 to 128 amino-acid residues. 

A conserved region, located in the central section has been selected as a signature pattern. 

- Consensus pattern: V-[KR]-[LIVM]-x(3)-[LIVM]-N-x-[AKH]-x-W-x-[KR]-G 

[ 1] Tanaka T, Kuwano Y., Kuzumaki T t Ishikawa K. ( Ogata K. Eur J. Biochem. 1 62:45-48(1 987).[ 2] Bergmann U., 
Arndt E. 

Biochim. Biophys. Acta 1050:56-60(1990). 
[1359] 522. Ribosomal protein L33 signature 
40 Ribosomal protein L33 is one of the proteins Irom the large ribosomal subunit. In Escherichia coli, L33 has been shown 
to be on the surface of 50S subunit. L33 belongs to a family of ribosomal proteins which, on the basis of sequence 
similarities [1,2,3], groups: - Eubacterial L33. 


35 


45 


50 


55 


Algal and plant chloroplast L33. - Cyanelle L33. 

L33 is a small protein of 49 to 66 amino-acid residues. A conserved region located in the central section of L33 has 
been selected as a signature pattern. 

- Consensus pattern: Y-x-[ST]-x-[KR]-[NS]-x(4HPATQ)-x(1 ,2)-[LI VM]-[EA]-x(2)-K-(FY]-[CSD] 

[ 1] Kruft V., Kapp U. t Wittmann-Liebold B. Biochimie 73:855-860(1991). 
[ 2] Sharp P.M. Gene 139:129-130(1994). 
[ 3] Otaka E., Hashimoto T, Mizuta K. 
Protein Seq. Data Ana!. 5:285-300(1993). 

[1360] 523. Ribosomal protein L34 signature 

[1361] Ribosomal protein L34 is one of the proteins from the large subunit of the prokaryotic ribosome. It is a small 
basic protein of 44 to 51 amino-acid residues [1]. L34 belongs to a family of ribosomal proteins which, on the basis of 
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sequenc similarities, groups: - Eubacterial L34. 
Red algal chloroplast L34. - Cyanelle L34. 
5 A conserved region that corresponds to the N-terminal half of L34 has been selected as a signature pattern. 

- Consensus pattern: K-[RG]-T-[FYWL]-(EQS]-x(5)-[KRHS]-x(4,5)-G-F-x(2)-R 

[ 1] Old I.G., Margarita D., Saint Girons I. 
io Nucleic Acids Res. 20:6097-6097(1992). 

[1362] 524. Ribosomal protein L34e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

is . Mammalian L34. - Mosquito L31 [1]. - Plant L34 [2]. 

Yeast putative ribosomal protein YIL052c. - Methanococcus jannaschii MJ0655. These proteins have 89 to 129 
amino-acid residues. ( 

A conserved region located in the N-terminal section of these proteins has been selected as a signature pattern. 

20 

- Consensus pattern: Y-x-[ST]-x-S-[NY]-x(5)-[KR]-T-P-G 

[ 1] Lan Q., Niu L.L, Fallon A.M. 
Biochim. Biophys. Acta 1218:460-462(1994). 
25 [ 2] Gao J., Kim S.R., Chung Y.Y., Lee J.M., An G. 

Plant Mol. Biol. 25:761 -770(1 994). 

[1363] 525. Ribosomal protein L35Ae signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
30 One of these families consists of: 

Vertebrate L35A. - Caenorhabditis elegans L35A (F10E7.7). 
Yeast L37A/L37B (Rp47). - Pyrococcus woesei L35A homolog [1]. 

35 These proteins have 87 to 110 amino-acid residues. 

A highly conserved stretch of 22 residues in the C-terminal part of these proteins has been selected as a signature 
pattern. 

Consensus pattern: G-K-[LIVM]-x-R-x-H-G-x(2)-G-x-V-x-A-x-F-x(3)-[LI]-P 

40 

[ 1] Ouzounis C., Kyrpides N., Sander C. 

Nucleic Acids Res. 23:565-570(1995). 

[1364] 526. Ribosomal protein L36 signature 

Ribosomal protein L36 is the smallest protein from the large subunit of the prokaryotic ribosome. It belongs to a family 
45 of ribosomal proteins which, on the basis of sequence similarities [1], groups: - Eubacterial L36. - Algal and plant 
chlorpplasl L36. - Cyanelle L36.L36 is a small basic and cystoino-rich protein of 37 amino-acid rosiduos. As a signature 
pattern, a conserved region that corresponds to positions 11 to 36 in L36 and includes three conserved cysteine residues 
has been developed. 

Consensus pattern: C-x(2)-C-x(2)-[LIVM]-x-R-x(3)-[LIVMN]-x-[LIVM)-x-C-x(3 ( 4)-[KR]-H-x-Q-x-0- 
50 [ 1] Otaka E., Hashimoto T, Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 
[1365] 527. Ribosomal protein L36e signature 

A number of eukaryotic ribosomal proteins can be grouped on the basis of sequence similarities. One of these families 
consists of: - Mammalian L36 [1]. 

55 . Drosophila L36 (M(1)1B). - Caenorhabditis elegans L36 (F37C12.4). 

Candida albicans L39. - Yeast YL39. t 

These proteins have 99 to 104 amino acids. 
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A conserved region in the central part of these proteins has been selected as a signatur pattern. 

- Consensus pattern: P-Y-E-[KR]-R-x-[LIVM]-[DE]-[LtVM](2)-[KR] 

| 1] Chan Y.-L. Phz V. ( Olvom J., Wool I.Q. 
Bipchem. Biophys. Res. Commun. 192:849-853(1993). 
[1366] 528. Ribosomal protein L39e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

- Mammalian L39 |1]. - Plants L39. - Yeast L46 [2]. - Archobacterial L39o [3]. These proteins are very basic. About 
50 residues long, they are the smallest proteins of eukaryotic-type ribosomes. A conserved region in the C-terminal 
Goclion ol Ihoso proteins has boon selected as a signature pattern. 

- Consensus pattern: [KRA]-T-x(3)-|LIVM]-[KRQF]-x-[NHS]-x(3)-R-[NHY]-W-R-R 

[ 1] Lin A., McNally J. t Wool I.G. J. Biol. Chem. 259:487-490(1984). 
[ 2] Leer R.J., van Raamsdonk-Duin M.M.C., Kraakman P., Mager W.H., 
Planta R.J. Nucleic Acids Res. 1 3:701-709(1 985). 
[ 3) Rfimiroz C. ( Louie KA ( Mnthoeon A.T FEBS Lett. 250:416-418(1989). 

[1367] 529. Ribosomal L40e family 

Bovino L40 has boon idontiliod as a secondary RNA binding protein [1]. L40 is fused to a ubiquitm protein [2]. 
Number of members: 27 
[1] 

25 Medline: 88203200 

RNA binding proteins of the large subunit of bovine mitochondrial ribosomes. 
Piatyszek MA, Denslow ND, O'Brien TW; 
Nucleic Acids Res 1988;16:2565-2583. 

[2]Medltne: 96011832 The carboxyl extensions of two rat ubiquilin fusion proteins are ribosomal proteins S27a and 
30 L40. 

Chan YL, Suzuki K, Wool IG; 
Biochem Biophys Res Commun 1 995;21 5:682-690. 
[1368] 530. (Ribosomal L44) Ribosomal protein L44e signature 

A number ol oukmyotlc nnd rirchnobnclorlal ribosomal proteins enn bo groupod on tho basis of sequence similarities. 
35 One of these families consists ol: 

Mammalian L44 [1]. - Trypanosoma brucei L44. 

- Caenorhabditis elegans L44 (C09H1 0.2). - Fungal L44 (L41 ). 
Halobacterium marismortui LA [2]. 


20 


AO 
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These proteins have 92 to 105 amino-acid residues. 

A conserved region located in the C-terminal part of these proteins has been selected as a signature pattern. 
- Consensus pattern: K-x-[TV]-K-K-x(2)-L-[KR]-x(2)-C 


[ 1] Gallagher M.J., Chan Y.-L, Lin A., Wool I.G. DNA 7:269-273(1 988). 
[ 2] Bergmann U., Wittmann-Liebold B. 
Biochim. Biophys. Acta 1173:195-200(1993 

50 [1369] 531 . Ribosomal protein L5 signature 

Ribosomal protein L5 is one of the proteins Irom the large ribosomal subunit. In Escherichia coli, L5 is known to be 
involved in binding 5S RNA to the large ribosomal subunit. It belongs to a family of ribosomal proteins which, on the 
basis of sequence similarities [1 ,2,3,4], groups: - Eubacteriat L5. 

55 - Algal chloroplast L5. - Cyanelle L5. - Archaebacterial L5. - Mammalian L11 . 
- Tetrahymena thermophila L21 . - Slime mold L5 (V18). - Yeast L16 (39A). 
Plants mitochondrial L5. 
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L5 is a protein of about 180 amino-acid residues. 

A conserved region, located in the first third of these proteins has been selected as a signature pattern. 

- Consensus pattern: [LIVM]-x(2)-ILIVMHSTAVCh^ 

5 • : 

[ 1] Hatakeyama T, Hatakeyama T. Biochim. Biophys. Acta 1039:343-347(1990). 
[ 2) Rosendahl G., Andreasen P.H., Kristiansen K. Gene 98:161-167(1991). 

[3] Yang D., Guntherl., Matheson A.T., AuerJ., SpickerG., Boeck A. Biochimie 73:679-682(1991). 
[ 4] Otaka E., Hashimoto T, Mizuta K., Suzuki K. Protein Seq. Data Anal.' 5:301-31 3(1993). 

70 

[1370] 532. ribosomal L5P family C-terminus 

[1371] This region is found associated with Ribosomal_L5. Number of members: 60 
[1372] 533. Ribosomal protein L6 signatures 

[1373] Ribosomal protein L6 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L6 is known 
is to bind directly to the 23S rRNA and is located at the aminoacyl-tRNA binding site of the peptidyltranslerase center. It 
belongs to a family of ribosomal proteins which, on the basis of sequence similarities [1 ,2,3,4), groups: - Eubacterial L6. 

Algal chloroplast L6. 

Cyanelle L6. 
20 - Archaebacterial L6. 

Marchantia polymorpha mitochondrial L6. 

Yeast mitochondrial YmL6 (gene MRPL6). 

Mammalian L9. 

Drosophila L9. 
25 - Plants L9. 

- Yeast L9 (YL11). 

[1374] While all the above proteins are evolutionary related it is very difficult to derive a pattern that will find them 
all. Two patterns were therefore created, the first to detect eubacterial, cyanelle and mitochondrial L6 ( the second to 
30 detect archaebacterial L6 as well as eukaryotic L9: 

- Consensus pattern: IPS]-[DENS]-x-Y-K-[GA]-K-G-ILIVM] 

- Consensus pattern: Q-x(3)-[LIVM]-x(2)-[KR]-x(2)-R-x-F-x-D-G-[LIVM]-Y-[LIVM]-x(2)-[KRJ 

35 [1] Suzuki K., Olvera J., Wool I.G. Gene 93:297-300(1990). 

[2] Schwank S., Harrer R., Schueller H.-J., Schweizer E. Curr. Genet. 24:136-140(1993). 

[3] Golden B.L., Ramakrishnan V, White S.W. EMBO J. 12:4901-4908(1993). 

[4] Otaka E., Hashimoto T., Mizuta K„ Suzuki K. Protein Seq. Data Anal. 5:301-313(1993). 

40 [1375] 534. Ribosomal protein L6e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

Mammalian ribosomal protein L6 (L6 was previously known as TAX- responsive enhancer element binding protein 
45 107). 

Caenorhabditis elegans ribosomal protein L6 (R1 51 .3). 
Yeast ribosomal protein YL1 6A/YL1 6B. 

Mesembryanthemum crystallinum ribosomal protein YL16-like. 

50 These proteins have 175 (yeast) to 287 (mammalian) amino acids. A highly conserved region in the central part of 
these proteins has been selected as a signature pattern. 

- Consensus pattern: N-x(2)-P-L-R-R-x(4)-IFY]-V-l-A-T-S-x-K 

55 [1376] 535. Ribosomal protein L7Ae signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 
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- Vertebrate L7A (SURF3) [1]. - Plant L7A. - Yeast L7A (YL5) (Rp6). 

- Yeast protein NHP2 |2J. - Yeast hypothetical protein YEL026w. 
Baciljus subtilis hypothetical protein ylxQ. - Halobacterium marismortui Hs6. 
Mothanococcus jannaschii MJ1203. 

[1377] These proteins have 100 to 265 amino-acid residues. 

A conserved region located in the central section has been selected as a signature pattern. 

- Consensus pattern: [CA]-x(4)-[IV]-P-[FY]-x(2)-[LIVM]-x-[GSQ]-[KRQ]-x(2)-L-G 

| 1| Colombo P., Yon J., Garson K. ( Frlod M. Proc. Natl. Acad. Sci. U.S.A. 89:6358-6362(1992). 
' [ 2] Kolodrubetz D., Burgum A. Yeast 7:79-90(1991). 

[1378] 536. Ribosomal protein L9 signature 
is Ribosomal protein L9 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L9 is. known to bind 
diroctly lo tho 23S rRNA. It belongs to a family ot ribosomal proteins which, on the basis of sequence similarities [1 ,2], 
groups: - Eubactorial L9. - Cyanobactorial L9. 

Plant chloroplast L9 (nuclear-encoded). - Red algal chloroplast l_9. 

A conserved region, located in the N-terminal section of these proteins has been selected as a signature pattern. 

. Consensus pattern: G-x(2)-[GN]-x(4)-V.x(2)-G-[FY]-x(2)-N-lFY]-L-x(5)-[GA].x(3)-[STN] 

25 [ 1] Hoffman D.W., Davies C, Gerchman S.E., Kycia J.H.. Porter S.J.. White S.W., Ramakrishnan V EMBO J. 13; 

205-212(1994). 

[ 2] Otaka E., Hashimoto T., Mizuta K., Suzuki K. Protein Seq. Data AnaL 5:301-313(1993). 

[1379] 537. Ribosomal protein S10 signature 
30 Ribosomal protein S10 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S10 is known to be 
involved in binding tRNA to the ribosomes. It belongs to a family of ribosomal proteins which, on the basis of sequence 
similarities [1], groups: - Eubacterial S10. 

Algal chloroplast S10. - Cyanelle S10. - Archaebacterial S10. 
35 - Marchantia polymorpha and Prototheca wickerhamii mitochondrial S10. 

Arabidopsis thaliana mitochondrial S10 (nuclear encoded). - Vertebrate S20. 

- Plant S20. - Yeast URP2. 

S10 is a protein of about 100 amino-acid residues. 
40 [1380] A conserved region located in the center of these proteins has been selected as a signature pattern. 

- Consensus pattern: [AV]-x(3)-[GDNSR]-[LIVMSTA]-x(3)-G-P-(LIVM]-x-[LIVM]-P-T 

[ 1] Otaka E., Hashimoto T., Mizuta K. 
45 Protein Seq. Data Anal. 5:285-300(1993). 

[1381] 538. Ribosomal protein S11 signature 

Ribosomal protein S11 [1] plays an essential role in selecting the correct tRNA in protein biosynthesis. It is located on 
the large lobe of the small ribosomal subunit. S11 belongs to a family of ribosomal proteins which, on the basis of 
sequence similarities, groups [2]: - Eubacterial S11. 

so 

Algal ana plant chloroplast S11. - Cyanelle S11. - Archaebacterial S11. 
Marchantia polymorpha and Prototheca wickerhamii mitochondrial S11 . 
- Acanthamoeba castellanii mitochondrial S11 . - Neurospora crassa S14 (crp-2). - Yeast S14 (RP59 or CRY1 ). 

- . Mammalian, Drosophila, Trypanosoma, and plant S14. 
55 - Caenorhabdilis elegans S1 4 (F37C1 2.9). 

One of the best conserved regions in these proteins was selected as a signature pattern. 
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- Consensus pattern: [LIVMF]-x-|GSTACJ-[LIVMF]-x(2)-[GSTAL]-x(0,1)-[GSN]-[LIVMF]-x-[LIVM]-x(4)-[DEN]-x-T-P- 
t x-[PA]-[STCH]-[DN] 

[ 1] Kimura M., Kimura J., Hatakeyama T. FEBS Lett. 240:15-20(1988). 
5 [ 2] Otaka E., Hashimoto T., Mizuta K. 

Protein Seq. Data Anal. 5:285-300(1993). 

[1382] 539. Ribosomal protein S1 2 signature 

Ribosomal protein S12 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S12 is known to be 
to involved in the translation initiation step. It is a very basic protein of 120 to 150 amino-acid residues. Si 2 bolongs to 
a lamily of ribosomal proteins which, on the basis of sequence similarities [1], groups: - Eubacterial S12. - Archaebac- 
terial S12. 


Algal and plant chloroplast S12. - Cyanelle S12. 
is - Protozoa and plant mitochondrial S12. - Yeast S28. 

Drosophila mitochondrial protein tko (Technical KnockOut). - Mammalian S23. The best conserved regions in those 
proteins, located in" the center of each sequence have been selected as a signature pattern. 
Consensus pattern: [RK]-x-P-N-S-[AR]-x-R 

20 [ 1] Otaka E., Hashimoto T., Mizuta K. 

Protein Seq. Data Anal. 5:285-300(1993). 
[1383] 540. Ribosomal protein S12e signature 

A number of eukaryotic ribosomal proteins can be grouped on the basis of sequence similarities. One of these families 
consists of: - Vertebrate S12 [1]. 

25 

Trypanosoma brucei S12 [2]. - Caenorhabditis elegans S12 (F54E7.2). 
Drosophila S12. - Yeast S12. 

These proteins have 1 30 to 1 50 amino acids. 
30 a conserved region in the N-terminal part of these proteins has been selected as a signature pattern. 

- Consensus pattern: A-L-[KRQP]-x-V-L-x(2)-(SA]-x(3)-[DN]-G-L 

[ 1] Lin A., Chan Y-L. ( Jones R., Wool I.G. 
35 J. Biol. Chem. 262:14343-14351(1 987). [ 2] MarchalC., Ismaili N., Pays E. Mol. Biochem. Parasitol. 57:331-334(1993). 
[1 384] 541 . Ribosomal protein S1 3 signature 

Ribosomal protein S1 3 is one of the proteins Irom the small ribosomal subunit. In Escherichia coli, S1 3 is known to be 
involved in binding fMet-tRNA and, hence, in the initiation of translation. It is a basic protein of 115 to 177 amino-acid 
residues and belongs to a family of ribosomal proteins which, on the basis of sequence similarities [1,2], groups: - 
40 Eubacterial S13. 

Plant chloroplast S13 (nuclear encoded). - Red algal chloroplast S13. 
Cyanelle S13. - Archaebacterial S13. - Plant mitochondrial S13. 
Mammalian and plant S1 8. 

45 

The best conserved regions in these proteins, located in their C-terminal part have been selected as a signature pattern. 

- Consensus pattern: [KRQS]-G-x-R-H-x(2HGSNH]-x(2)-[LIVMC]-R-G-Q 

so [ 1) Chan Y.-L, Paz V. ( Wool I.G. 

Biochem. Biophys. Res, Commun. 178:1212-1218(1991). 
[ 2] Otaka E., Hashimoto T., Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

55 [1385] 542. Ribosomal protein S14p/S29o (Ribosomal protein S14 signature) 

[1366] Ribosomal protein 914 is one ol the pioloins Irom Ilia small ijbobunial subunit. In LzschtJiioliia uuli, ti|4 is 
known to be required for the assembly of 30S particles and may also be responsible lor determining the conformation 
of 1 6S rRNA at the A site. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities |1 ,2], 
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groups: 

Eubgcterial S14. 
Alfjnl nnd plnnl chloroplnot S14. 
CyanolleS14. 

- . Archaebacterial Methanococcus vannielii S14. 
Plant mitochondrial S14. 
Yeast mitochondrial MRP2. 
Mammalian S29. 

- YohbI YS29A/B. ' 

[1387] S14 is a protein of 53 to 115 amino-acid residues. Our signature pattern is based on the tew conserved 
positions localod in the conter of these proteins. 

[1388] Consensus pattern: [RP].x(0,1)-C-x(11,12)-[LIVMF].x.[LIVMF]-[SCHRG]-x(3)-[RN] 

|1|Chiin Y-L., fiUi/uki K., Olvorn J., Wool I.G. Nuclolc Acido Ros. 21:649-655(1993). 
[2] CHaka E., Hashimoto T.. Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 

[1389] 543. Ribosomat protein S15 signature 
20 Ribosomal protein S15 is one of the proteins from the small ribosomal subunit. In Escherichia coli. this protein binds 
to 16S ribosomal RNA and functions al early steps in ribosome assembly. It belongs to a family of ribosomal proteins 
which, on tho bMRift of Boquonco similarities |1,2], groups: • Eubacterial S15. 

Archaebacterial Halobacterium marismortui HmaS15 (HS11). 
25 - Plant chloroplast S1 5. - Yeast mitochondrial S28. - Mammalian S13. 

- Brugia pahangi and Wuchereria bancrofti S13 (S15). - Yeast S13 (YS15). 

S15 io n protoin of 80 to 250 amino-acid residues. 

A conserved region located in the C-lerminal part of these proteins has been selected as a signature pattern. 


16 


30 


Consensus pattern: [LIVM]-x(2)-H-[LIVMFY]-x(5)-D-x(2)-[SAGN)-x(3)-ILF]-x(9)-[LIVM)-x(2)-[FY] 


[ 1] Dang H., Ellis S.R. 
Nucleic Acids Res. 18:6895-6901(1990). 
35 [ 2} Otaka E., Hashimoto T., Mizuta K. 

Protein Seq. Data Anal. 5:285-300(1993). 

[1390] 544. Ribosomal protein S16 signature 

[1 391] Ribosomal protein Si 6 is one of the proteins from the small ribosomal subunit. It belongs to a family of ribos- 
40 omal proteins which, on the basis of sequence similarities [1], groups: 

Eubacterial S16. 

Algal and plant chloroplast S1 6. 

CyanelleS16. 

45 . Neurospora crassa mitochondrial S24 (cyt-21). 

[1392] S16 is a protein of about 100 amino-acid residues. A conserved region located in the N-terminal extremity of 
these proteins has been selected as a signature pattern. 
[1393] Consensus pattern: [LIVMT]-x-[LIVM]-[KRH-[STAK]-R-x-G-[AKR] 
50 [1394] [1] Otaka E., Hashimoto T, Mizuta K. Protein Seq. Data AnaL 5:285-300(1993). 
[1395] 545. Ribosomal protein S17 signature 

Ribosomal protein Si 7 is one of the proteins from the small ribosomal subunit. In Escherichia coll. Si 7 is known to 
btnd specifically to the 5'end of 16S ribosomal RNA and is thought to be involved in the recognition of termination 
codons. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities [1,2,3], groups: - 
55 Eubacterial S1 7. 

Plant chloroplast S17 (nuclear encoded). - Red algal chloroplast S17. 
Cyanelle S17. - Archaebacterial S17. - Mammalian and plant cytoplasmic S11. 
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- Yeast S18a and S18b (RP41; YS12). 

The best conserved regions located in the C-terminal sections of these proteins have been selected as a. signature 
pattern. 

5 

- Consensus pattern: G-D-x-[LIV]-x-[LIVA]-x-[QEK]-x-[RK]-P-[LIV]-S 

[ 1) Gantt J.S., Thompson M.D. J. Biol. Chem. 265:2763-2767(1990). 
[ 2] Herfurth E., Hirano H., Wittmann-Liebold B. 
io Biol. Chem. Hoppe-Seyler 372:955-961(1991). 

,[ 3] Otaka E., Hashimoto T, Mizuta K. 
Protein Seq. Data Ana!. 5:285-300(1993). 

[1396] 546. Ribosomal protein S17e signature 
15 A number of eukaryottc and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

Vertebrates S17 [1]. - Drosophila S17 [2]. - Neurospora crassa S17 (crp-3). 

- Yeast S17a (RP51 A) and S17b (RP51B) [3]. - Methanococcus jannaschii MJ0245. These proteins have from 63 
20 (j n archebacteria) to 1 30 to 1 46 amino acids and are highly conserved. A region in the central part of these proteins 

has been selected as a signature. 

- Consensus pattern: A-x-l-x-[ST]-Koc-L-R-N-[KR]-l-A-G-[FY]-x-T-H 

[ 1] Chen l.-T. ( Roufa D.J. Gene 70:107016(1988). 
25 [ 2) Maki C, Rhoads D.D., Stewart M.J., van Slyke B,, Denell R.E., 

Roufa D.J. Gene 79:289-298(1 989). [ 3] Abovich N., Rosbash M. 
Mol. Cell. Biol. 4:1871-1879(1984). 

[1397] 547. Ribosomal protein S18 signature 

30 Ribosomal protein S1 8 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S18 has been 
involved in aminoacyl-tRNA binding! 1]. It appears to be situated at the tRNA A-site of the ribosome. It belongs to a 
family of ribosomal proteins which, on the basis of sequence similarities[2], groups: - Eubacterial S18. - Algal and plant 
chloroplast S18. - Cyanelle S18.As a signature pattern, a conserved region in the central section of the protein has 
been selected. This region contains two basic residues which may be involved in RNA-binding.- 

35 Consensus pattern: [IV]-[DY]-Y-x(2)-[LIVMT]-x(2HLIVM]-x(2)-[FYT]-[LIVM]- [ST]-[DERP]-x-[GY]-K-[LIVM]-x(3)-R- 
[LIVMAS]- 

[ 1] McDougall J., CholiT, Kruft V., Kapp U., Wittmann-Liebold B. FEBS Lett. 245:253-260(1 989).[ 2] Otaka E., Hash- 
imoto T, Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 
[1398] 548. Ribosomal protein S19 signature 
40 Ribosomal protein S19 is one of the proteins from the small ribosomal subunit. In Escherichia coli, SI 9 is known to 
form a complex with S13 that binds strongly to 16S ribosomal RNA. S19 belongs to a family of ribosomal proteins 
which, on the basis of sequence similarities [1,2], groups: - Eubacterial S19. 

Algal and plant chloroplast S1 9. - Cyanelle S1 9. - Archaebacterial S1 9. 
45 - Plant mitochondrial S19. - Eukaryottc S15 ('rig* protein). 

S19 is a protein of 88 to 144 amino-acid residues. Our signature pattern is based on the few conserved positions 
located in the C-terminal section of these proteins. 

50 - Consensus pattern: [STDNQ]-G-[KRQM]-x(6)-[LIVM]-x(4)-[LIVM]-[GSD]-x(2)-[LF]-[GAS]-[DE]-F-x(2)-[ST] 

[ 1] Kitagawa M., TakasawaS., KikuchiN., ItohT., TeraokaH., Yamamoto H., OkamotoH. FEBS Lett. 283:210-214 
(1991). 

[ 2] Otaka E., Hashimoto T, Mizuta K. 
55 Protein Seq. Data Anal. 5:285-300(1993). 

I 

[1399] 549. Ribosomal protein SI 9e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can bo grouped on tho basis of eoquonco similarities 
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[1,2]. One of these families consists of: - Mammalian S19 r - Drosophila S19. 

- Ascaps lumbricoides S19g (ALEP-1) and S19s. - Yeast YS16 (RP55A and RP55B). 
Aspergillus S1 6. - Halobactorium marismortui HS1 2, . , 

These proteins have 1 43 to 1 55 amino acids. 

A well conserved stretch of 20 residues in the C-terminal part of these proteins has been selected as a signature pattern. 

- Consensus pattern: P-x(6)-[SAN]-x(2HLIVMA]-x-R-x-[ALIVHLV]-Q-x^L-[EQ] 

( 1] Etter A.. Aboutanos M., Tobler H., Mueller F. 
Proc. Natl. Acad. Sci. U.S.A. 88:1593-1596(1991). 
| 2) Suzuki K.. Olvom J„ Wool LG. Biochimio 72:299-302(1990). 

is [1400] 550. Ribosomal protein S2 signatures 

Ribosomal protein S2 is one of the proteins from the small ribosomal subunit. S2 belongs to a 'family of ribosomal 
proteins which, on the basis of sequence similarities [1 ,2], groups: - Eubacterial S2. - Algal and plant chloroplast S2. 

Cyanelle S2. - Archaebacterial S2. 
20 - Higher eukaryotes P40 (previously thought to be a laminin receptor). 

Yeast NAB1 . - Plant mitochondrial S2. - Yeast mitochondrial MRP4. 

S2 is a protein of 235 to 394 amino-acid residues. 

Two conserved regions have been selected as signature patterns. One is located in the N-terminai section and the 
25 other in the central section. 

- Consensus pattern: [LIVMFA]-x(2)-[LIVMFYC](2)-x-[STAC]-[GSTANQEKR]-[STALV]- 
IHYHLIVMF1-G 

- Consensus pattern: P-x(2)-[LIVMF](2HLIVMS]-x-[GDN]-x(3)-lDENL]-x(3)-[LIVM]-x^.x(4)-[GNQKRH]-[LIVM]- 
30 [AP] 

[ 1] Davis S.C., Tzagoloff A., Ellis S.R. 

J Biol. Chem. 267:5508-5514(1992). ^ v 
| 2] Tohgo A., Takaoawa S. ( Munakala H. t Yonokura H., Hayashl N., Okamoto H. FEBS Lett. 340:133-138(1994). 
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[1401] 551. Ribosomal protein S21 signature 

[1402] Ribosomal protein S21 is one of the proteins from the small ribosomal subunit. So far S21 has only been 
lound in eubacteria. it is a protein of 55 to 70 amino-acid residues. A conserved region in the N-terminal section of the 
protein has been selocted as a signature pattern. 
40 [1403] Consonsus pattorn: |DE]-x-A-[UY]-[KR]-R-F-K-[KR]-x(3HKR] 
[1404] 552. Ribosomal protein S21e signature 

A number of eukaryotic ribosomal proteins can be grouped on the basis of sequence similarities. One of these families 
consists of: - Mammalian S21 [1]. 

45 - Caonorhabditis elegans S21 (F37C12.11). - Rice S21 [2]. 
- Yeast S21 (Ys25) [3]. - Fission yeast S28 [4). 

These proteins have 82 to 87 amino acids. 

A perlectly conserved nonapeptide in the N-terminal part of these proteins has been selected as a signature pattern. 


£0 


Consensus pattern: L-Y-V-P-R-K-C-S-[SA] 


- [ 1] Bhat K.S., Morrison S.G. Nucleic Acids Res. 21:2939-2939(1993). 
[ 2] Nishi R., Hashimoto H., Uchimiya H., Kato A. 
55 Biochim. Biophys. Acta 1216:113-114(1993).[ 3] Suzuki K., Otaka E. Nucleic Acids Res. 16:6223-6223(1988).[ 4] 

Itoh T, Okata E., Matsui K.A. Biochemistry 24:7418-7423(1985). 

[1405] 553. Ribosomal protein S24e signature 
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A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

Vertebrate S24 [1 ]. - Yeast Rp50. - Mucor racemosus S24 |2]. 
s - Halobacterium marismortui HS1 5 [3]. - Methanococcus jannaschii MJ0394. 
These proteins have 101 to 148 amino acids, 

A well conserved stretch in the central part of these proteins has been selected as a signature pattern. 

io . Consensus pattern: [FYA]-G-x(2)-[KR]-[STA]-x-G-[FYHGA]-x-[LIVM]-Y-[DN]-[SDN] 

[ 1] Brown S.J., Jewell A., Maki C.G., Roufa D.J. Gene 91:293*296(1990). 
[ 2] Sosa L, Fonzi W.A., Sypherd RS. 

is [1406] Nucleic Acids Res. 17:9319-9331 (1989).[ 3] Kimura J.. Arndt E., Kimura M. FEBS Lett. 224:65-70(1987). 
[1407] 554. Ribosomal protein S26e signature 

A number of eukaryotkrribosomal proteins can be grouped on the basis of sequence similarities. One of these families 
consists of: - Mammalian S26 [1]. 

20 . Octopus S26 [2]. - Drosophila S26 (DS31) [3]. - Plant cytoplasmic S26. 

- FungiS26[4]. 

These proteins have 11 4 to 127 amino acids. 

A conserved octapeptide in the central part of these proteins has been selected as a signature pattern. 

25 

Consensus pattern: [YHJ-C-V-S-C-A-I-H 

[ 1] Kuwano Y, NakanishiO., Nabeshima Y, Tanaka T, Ogata K. J. Biochem. 97:983-992(1 985).[ 2] Zinov'eva R. 
D. ( Tomarev S.I. Dokl. Akad. Nauk SSSR 304:464-469(1989). 
30 [ 3] Itoh N., Ohta K„ Ohta M., Kawasaki T, Yamashina I. Nucleic Acids Res. 17:2121-2121 (1989).[ 4] Wu M., Tan 

H. Gene 1 50:401 -402(1 994). 

[1408] 555. Ribosomal protein S28e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
55 One of these families consists of: 

- Mammalian S28 [1]. - Plant S28 [2]. - Fungi S33 [3]. 
Methanococcus jannaschii MJ1 202. 

40 These proteins have from 64 to 78 amino acids. 

A highly conserved nonapeptide from the C-terminal extremity of these proteins has been selected as a signature 
pattern. 

Consensus pattern: E-[ST]-E-R-E-A-R-x-L 

45 

[ 1] Chan Y-L, Olvera J., Wool I.G. 

Biochem. Bibphys. Res. Commun. 179:314-318(1991). 

[ 2] Hwang l. ( Goodman H.M. Plant Physiol. 102:1357-1358(1993). 

[ 3] Hoekstra R., Ferreira P.M., Bootsman T.C., Mager W.H., Planta R.J. Yeast 8:949-959(1992). 

so 

[1409] 556. Ribosomal protein S3Ae signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

55 - Mammalian S3A (was originally known as v-fos transformation effector protein). - Caenorhabditts elegans S3A 
(F56F3.5). t 

- Plant cytoplasmic S3A (CYC07) [1]. - Yeast Rp10 (PLC1 and PLC2). 
Fission yeast Rp10 (SpAC13G6.02c). - Methanococcus jannaschii MJ0980. 


208 


EP 1 033 405 A2 


These proteins have from 220 to 250 amino acids. 

A consorved stretch in their N-terminal section was selected as a signature pattern. 

i ■ 

- Conoonouo pnttorn: [LIV]-x-[GH]-R-[IV]-x-E-x-|SC]-L-x-D-L 

[ 1} Liu J.H., Reid D.M. 

Plant Physiol. 109:338-338(1 995). 

[1410] 557. Ribosomal protein S3 signature . 

Ribosomal protein S3 is one of the proteins from the small ribosomal subunit. In ( Escherichia coh, S3 is known to be 
io Involved In Iho binding ol Initiator Mot-lRN A. It bolongs to a family of ribosomal proteins which, on the basis of sequence 
similarities [1], groups: - Eubactorlal S3. 

Algal and plant chloroplast S3. - Cyanelle S3. - Archaebacterial S3. 
Plant mitochondrial S3. - Vertebrate S3. - Insect S3. 
is - Caenorhabditis elegans S3 (C23G10.3). - Yeast S3 (Rp13). 


30 


35 


50 


S3 is a proloin of 209 to 559 amlnb-ocid residues. 

A conserved region located in the C-terminal section has been selected as a signature pattern. 

20 . consensus pattern: [GSTAHKR^ 
x(2)-G-x(2)-G 

[ 1] Otaka E., Hashimoto T, Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1 993). 
25 [1411] 558, Ribosomal protein S4 signature 

Ribosomal protein S4 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S4 is known to bind 
directly to 1 6S ribosomal RNA. Mutations in S4 have been shown to increase translational error frequencies. It belongs 
to n family of riboeornul protoins which, on the basis of soquonco similarities [1,2], groups: - Eubacterial S4. - Algal 
and plant chloroplast S4. 


- Cyanelle S4. - Archaebacterial S4. - Mammalian S9. - Yeast YS11 (SUP45). 
Marchantia polymorpha mitochondrial S4. - Dictyostelium discoideum rp1024. 

- Yeast protein NAM9 [3). NAM9 has been characterized as a suppressor for ochre mutations in mitochondrial DNA. 
It could be a ribosomal protein that acts as a suppressor by decreasing translation accuracy. 

S4 is a protein of 171 to 205 amino-acid residues (except for NAM9 which is much larger). The signature pattern for 
this protein is based on a conserved region located in the central section of these proteins. 

- ConsonsuspaUorn:[LIVMHDE]-x-RML 
40 x-[LIVMF](2) 

[ 1] Mizuta K., Hashimoto T, Suzuki K.L, Otaka E. Nucleic Acids Res. 19:2603-2608(1991). 
[ 2] Otaka E., Hashimoto T., Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 

[ 3] Boguta M. ( Dmochowska A., Borsuk P., Wrobel K., Gargouri A. ( Lazowska J., Slonimski P., Szczesniak B., 
45 Kruszowska A. Mol. Cell. Biol. 12:402-412(1992). 

[1412] 559. Ribosomal protein S4e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 


Mammalian S4 [1 J. Two highly similar isoforms of this protein exist : one coded by a gene on chromosome Y, and 
the other on chromosome X. 
- - Plant cytoplasmic S4 [2] - Yeast S7 (YS6). - Archebacterial S4e. 

55 These proteins have 233 to 264 amino acids. 

A highly conserved stretch of 15 residues in their N-terminal section has been selected as a signature pattern. Four 
positions in this region are positively charged residues. 
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- Consensus pattern: H-x-K-R-[LIVMF]-[SANK]-x-P-x(2)-[WY]-x-[UVM]^x-[KRP] 

[ 1] Fisher E.M., Beer-Romero P., Brown L.G., Ridley A., McNeil J. A., Lawrence J.B., Willard H.R, Bieber F.R., 
Page D.C. Cell 63:1205-1218(1990). 

[ 2] Braun H.P, Emmermann M., Mentze! H., Schmitz U.K. Biochim.Biophys. Acta 1218:435-438(1994). 
[1413] 560. Ribosomal protein S5 signature 

Ribosomal protein S5 is one of the proteins from the small ribosomaJ subunit. In Escherichia coli, S5 is known to be 
important in the assembly and function of the 30S ribosomal subunit. Mutations in S5 have been shown to increase 
translational error frequencies. It belongs to a family of ribosomal proteins which, on the basis of sequonco similarities 
[1,2], groups: - Eubaclerial S5. 

Cyanelle S5. - Red algal chloroplast S5. - Archaebacterial S5. 
Mammalian S2 (LLrep3). - Caenorhabditis elegans S2 (C49H3.11). 
Drosophila S2. - Plant S2. - Yeast S4 (SUP44). - Fungi mitochondrial S5. 

55 is a protein of 1 66 to 254 amino-acid residues. The signature pattern for this protein is based on a conserved region, 
rich in glycine residues, and located in the N-terminal section of these proteins. 

- Consensus pattern: G-[KRQ]-x(3)-[FY]-x-[ACV]-x(2)-[LIVMA]-[LIVM]-[AGHDN]-x(2)-G-x-[LIVM]-G-x-[SAG]-x 
(5,6)-[DEQ]-[LIVMA]-x(2)-A-[LIVMF] 

[ 1] All-Robyn J. A., Brown N., Otaka E., Liebman S.W. 

Mol. Cell. Biol. 10:6544-6553(1 990).[ 2] Otaka E., Hashimoto T, Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 
[1414] 561. Ribosomal protein S6 signature 

Ribosomal protein S6 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S6 is known to bind 
together with S18 to 16S ribosomal RNA. It belongs to a family of ribosomal proteins which, on the basis of sequence 
similarities, groups: - Eubacterial S6. - Red algal chloroplast S6. 

Cyanelle S6. 

56 is a protein of 95 to 208 amino-acid residues. The signature pattern for this protein is based on a conserved region 
located in the N-terminal section of those protoins. 

- Consensus pattern: G-x-[KRC]-[DENQRH]-L-[SA]-Y-x-1-[KRNSA] 
[1415] 562. Ribosomal protein S6e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can bo grouped on the basis of soquonco similarities. 
One of these families consists of: 

Mammalian S6 [T]. - Drosophila S6 [2]. - Plant S6 [3]. - Yeast S10 (YS4). 

Halobacterium marismortui HS13 [4]. - Methanococcus jannaschii MJ1260. S6 is the major substrate of protein 
kinases in eukaryotic ribosomes [5]; it may have an important role in controlling cell growth and proliferation through 
the selective translation of particular classes of mRNA. 

These proteins have 135 to 249 amino acids. 

A conserved stretch of 12 residues in the N-terminal part of these proteins has been selected as a signature pattern. 
Consensus pattern: [LIVM]-[STAMR]-G-G-x-D-x(2)-G-x-P-M 
[ 1] Franco R , Rosenfeld M.G. J. Biol. Chem. 265:4321-4325(1990). 

[ 2] Watson K.L, Konrad K.D., Woods D.F., Bryant RJ. Proc. Natl. Acad. Sci. U.S.A. 89:11302-11306(1992). 
[ 3] Hansen G., Estruch J.J., Spena A, Nucleic Acids Res. 20:5230-5230(1992). 

[ 4] Kimura M., Arndt E., Hatakeyama T t Hatakeyama T., Kimura J. Can. J. Microbiol. 35:195-199(1989). 
[ 5] Bandi H.R., F rrari S., Krieg J., Meyer H.E., Thomas G. J. Biol. Chem. 268:4530-4533(1993). 

t 

[1416] 563. Ribosomal protein S7 signature 

Ribosomal protein S7 is one of the proteins from the small ribosomal eubunlt. In Escherichia coll, S7 Is known to bind 
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directly to part of the 3'end of 16S ribosomal RNA. It belongs to a family of ribosomal proteins which, on the basis of 
sequence similarities [1,2,3], groups: - Eubacterial S7. 

Alnnl'nnd plnnt chloroplaot S7. - Cyanollo S7. - Archaobactorial S7. 
o - Plant mitochondrial S7. - Mammalian S5. - Plant. S5. 
Caenorhabditis elegans S5 (T05E 11.1). 

The best conserved region located in the N-terminal section of these proteins has been selected as a signature pattern. 

io - Consensus pattern: [DENSK]oHUVMDET]o(<3HLIVM^ 
|STACJ 

[ 1] Klussmann S., Franko P., Borgmann U„ Kostka S. ( Wittmann-Liebold B. Biol. Chem. Hoppe-Seyler 374: 
305-312(1993). 

is [ 2] Otaka E., Hashimoto T., Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 

| 3] Ignatovich O., Cooper M., Kulosza H.M., Beggs J.D. Nucleic Acids Res. 23:4616-4619(1995). 

[1417] 564. Ribosomal protein S7e signature 

[1418] A numbor of oukaryotic ribosomal proteins can be grouped on the basis of sequence similarities [1). One of 
20 these families consists of: 

Mammalian S7. 
Xenopus SB. 
Insect S7. 

25 - Yeast probable ribosomal protein S7 (N221 2). 

Fission yeast probable ribosomal protein S7 (SpACl8G6.13c). 

Those- proteins have about 200 amino acids. A highly conserved stretch of 14 residues which is located in the central 
section and which is rich in charged residues was selected as a signature pattern. 
30 [1419] Consensus pattern: [KR]-L-x-R-E-L-E-K-K-F-[SAP]-x-[KR]-H 

[1420] [1] Salazar C.E., Mills-Hamm D.M., Kumar V., Collins F.H. Nucleic Acids Res. 21:4147-4147(1993). 
[1421] 565. Ribosomal protein S8 signature 

Ribosomal protein S8 is one of the proteins from the small ribosomal subunit. In Escherichia coll, S8 is known to bind 
directly to 16S ribosomal RNA. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities 
3$ [i], groups; - Eubacterial S8. - Algal and plant chloroplast S8. 

Cyanelle S8. - Archaebacterial S8. - Marchantia polymorpha mitochondrial S8. 

- Mammalian S15A. - Plant S15A. - Yeast S22 (S24). 

40 The best conserved region located in the C-terminal section of these proteins has been selected as a signature pattern. 

- Consensus pattern: [GE]-x(2)-[LIV](2)-[STY)-[ST]-x(2)-G-[LIVM](2)-x(4)-[AG]-[KRHAYI) 

[ 1 ] Otaka E., Hashimoto T., Mizuta K. 
45 Protoin Soq. Data Anal. 5:285-300(1993). 

[1422] 566. Ribosomal protein S8e signature .... 
A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities 
[1 J. One of these families consists of: 

50 . Mammalian SB. - Caenorhabditis elegans S8 (F42C5.8). - Leishmania major S8. 
Plant S8. - Yeast S8 (S14) (Rp19). - Archebacterial S8e. 

These proteins have either about 220 amino acids (in eukaryotes) or about 125 amino acids (in archebacteria). A 
conserved stretch which is located in the N-terminal section and which is rich in positively charged residues has been 
55 selected as a signature pattern. 

- Consensus pattern: [KR]-x(2)-[ST]-G-[GA]-x(5HHRHKG]-[KR]-x-K-x-E-[LM]-G 
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[ 1] Engemann S., Herfurth E., Briesemeister U. t Wittmann-Liebold B. 

J. Protein Chem. 14:189-195(1995). 

[1423] 567. Ribosomal protein S9 signature 

Ribosomal protein S9 is one of the proteins from the small ribosomal subunit. It belongs to a family of ribosomal proteins 
s which, on the basis of sequence similariti s [1 ,2], groups: - Eubacterial S9. - Algal chloroplast S9. . ( 

Cyanelle S9. - Archaebacterial S9. - Mammalian S16. - Plant S16. 
Yeast mitochondrial ribosomal S9. 

10 a conserved region containing many charged residues and located in the central section of these proteins has been 
selected as a signature pattern. 

- Consensus pattern: G-G-G-x(2)-[GSA]-Q-x(2)-[SA]-x(3)-[GSA]-x-[GSTAV]-[Kn]-[GSAL]-[LIF] 

is [ 1] Chan Y-L, Paz V, Olvera J. ( Wool I.G. FEBS Lett. 263:85-88(1990). 

[ 2] Otaka E., Hashimoto T, Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

[1424] 568. Ribulose-phosphate 3-epimerase family signatures 
20 Ribulose-phosphate 3-epimerase (EC 5.1.3.1) (also known as pentose-5-phosphate 3-epimerase or PPE) is the en- 
zyme that converts D-ribulose 5-phosphate into D-xylutose 5-phosphate in Calvin's reductive pentose phosphato cycle. 
In Alcaligenes eutrophus two copies of the gene coding for PPE are known [ 1 ], one is chromosomally encoded (cbbEC), 
the other one is on a plasmid (cbbeP). PPE has been found in a wide range of bacteria, archebacteria, fungi and plants. 
The sequence of PPE is highly related to: 

2S 

Escherichia coli D-allulose-6-phosphate 3-epimerase (gene alsE). 
Escherichia coli protein sgcE. 

Mycoplasma genitalium hypothetical protein MG 11 2. 

50 All these proteins have from 209 to 241 amino acid residues. 

Two conserved regions which are located respectively in the N-terminal and in the central part of these proteins have 
be n selected as signature patterns. 

- Consensus pattern: [LIVMF]-H-[LIVMFY]-D-[LIVM]-x-D-x(1 ,2)-[FY]-[LIVM]-x-N-x-[STAV] 
35 - Consensus pattern: [LIVMA]-x-[LIVM]-M-[ST]-[VS]-x-P-x(3)-G-Q-x-F-x(6)-[NKHUVMC] 

[ 1] Kusian B., Yoo J.G., Bednarski R., Bowien B. 
J. Bacteriol. 174:7337-7344(1992). 

[1425] 569. (Ricin B lectin) Similarity to lectin domain of ricin beta-chain, 3 copies. 
40 [1426] This family consists of a triplicated domain involved in cell agglutination in ricin. 
[1427] 570. (Rotamase) PpiC-type peptidyl-prolyl cis-trans isomerase signature 

Peptidyl-prolyl cis-trans isomerase (EC 5.2.1.8) (PPIase or rotamase) is an enzyme that accelerates protein folding 
by catalyzing the cis-trans isomerization of proline imidic peptide bonds in oligopeptides [1 ]. Most characterized PPiases 
belong to two families, the cyclophilin-type (see <PDOC001 54>) and the the FKBP-typo (see <PDOC00426>). Rocontly 
45 a third family has been discovered [2,3]. So far, the only biochemically characterized member of this family is the 
Escherichia coli protein parvulin (gene ppiC), a small (92 residues) cytoplasmic enzyme that prefers amino acid resi- 
dues with hydrophobic side chains like leucine and phenylalanine in the P1 position of the peptides substrates. PpiC 
is evolutionary related to a number of proteins that are also probably PPiases: 

so . Escherichia coli and Haemophilus influenzae ppiD. PpiD is a PPIase which conlains a poriplasmic ppiC-liko domain 
anchored to the inner membrane and which seems to be involved in the folding of outer membrane proteins. 
Escherichia coli surA. SurA is a periplasmic protein that contains two ppiC-like domains. 

Nitrogen-assimilating bacteria protein nifM which is involved in the activation and stabilization of the iron-compo- 
nent (nifH) of nitrogenase. 
55 - Bacillus subtilis protein prsA, a membrane-bound lipoprotein involved in protein export. 

Lactococcus and lactobaclllus protoaao rnatumtlon piotolrr pMM, « mmnbmna.bountJ |ipo|>tutulrt Invtilvod it tithe 
maturation of a secreted serine proteinase. - Yeast protein ESS1/PTF1 (processing/termination factor 1). 
Drosophila protein dodo (gene dod). - Mammalian protein PIN1, 
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Campylobacter jejuni cell binding factor 2 (CBF2), a secreted antigen. 

Bacillus subtilis hypothetical protein yacD. ( 
Helicobacter pylori hypothetical protein HP01 75. , 
A hypothetical slimo mold protein. 

A conserved region that contains a serine which could play a role in the catalytic mechanism of these enzymes has 
been selected as a signature pattern. 

- Consensus pattern: F-[GSADEl]-x-[LVAQ]-A-x(3HST]-x(3,4)-[STQ]-x(3,5)-[GER]-G-x-[LIVMHGSJ 
10 * 
[ 1] Fischer G., Schmid F.X. 
Biochemistry 29:2205-2212(1990). 

[ 2] Rudd K.E., Sofia H.J., Koonin E.V., Plunkett G. Ill, Lazar S., 
Rouviere RE. Trends Biochem. Sci. 20:14-15(1995). 
is [ 3] Rahfeld J.-U., Ruecknagel K.P., Schelbert B., Ludwig B., Hacker J., 

Mann K., Fischer G. FEBS Lett. 352:180-184(1994). 

i 

[1428] 571 . (RrnaAD) Ribosomal RNA adenine dimethylases signature 

A number of enzymes responsible for the dimethylation of adenosines if ribosomal RNAs (EC 2.1.1.48) have been 
20 found [1 ,2] to be evolutionary related. These enzymes are: 

Bacterial 16S rRNA dimethylase (gene ksgA), which acts in the biogenesis of ribosomes by catalyzing the dimeth- 
ylation of two adjacent adenosines in the loop of a conserved hairpin near the 3'-end of T6S rRNA. Inactivation of 
ksgA leads to resistance to the aminoglycoside antibiotic kasugamycin. 
2S - Yeast 18S rRNA dimethylase (gene DIM1), which is functionally similar to ksgA and that dimethylates twin ade- 
nosines in the 3'-end of 18S rRNA. 

Bacterial 'erm' methylases. These enzymes confer resistance to macrolide-lincosamide-streptogramin B (MLS) 
antibiotics - such as erythromycin - by dimethylating the adenine residue at position 2058 of 23S rRN Athus resulting 
in a reduced affinity between ribosomes and the MLS antibiotics. 
30 - Caenorhabditis elegans hypothetical protein E02H1 .1 . 

The best conserved regions in these enzymes is located in the N-terminal section and corresponds to a region that is 
probably involved in S-adenosyl methionine (SAM) binding. 

35 - Consensus pattern: [UVMHLIVMFY]-[DE]-x-G-[STAP\^-G^ 
[STAGV]-[LIVMFYHC]-E-x-D 

[ 1] van Gemen B., van Knippenberg RH. (In) Nucleic acid methylation, Clawson G.A., Willis D.B., Weissbach A., 
Jones PA,, Eds., pp. 19-36, Alan R. Liss Inc. New- York, (1990). 
40 [ 2] Lafontaine D. t Delcour J., Glasser A.L., Desgres J., Vandenhaute J. J. MoL Biol. 241:492-497(1994). 

[1429] 572. (RuBisCO small) Ribulose bisphosphate carboxylase, small chain. 206 members 
[1430] 573. ATP/GTP-binding site motif A (P-loop) (ras) 

From sequence comparisons and crystallographic data analysis it has been shown [1,2,3,4,5,6] that an appreciable 
45 proportion of proteins that bind ATP or GTP share a number of more or less conserved sequence motifs. The best 
conserved of these motifs is a glycine-rich region, which typically forms a flexible loop between a beta-strand and an 
alpha-helix. This loop interacts with one of the phosphate groups of the nucleotide. This sequence motif is generally 
referred to as the 'A' consensus sequence [1] or the 'P-loop* [5]. There are numerous ATP- or GTP-binding proteins in 
which the P-loop is found. A number of protein families for which the relevance of the presence of such a motif has 
so been noted are listed below: - ATP synthase alpha and beta subunits. - Myosin heavy chains. - Kinesin heavy chains 
and kinosin-liko protoins. - Dynamins and dynamin-like proteins - Guanylate kinase - Thymidine kinase (- Thy rnidy late 
kinase. - Shikimate kinase. - Nitrogenase iron protein family (nifH/lrxC) - ATP-binding proteins involved in 'active trans- 
port' (ABC transporters) [7] - DNA and RNA helicases [8,9,10). - GTP-binding elongation factors (EF-Tu, EF-1 alpha, 
EF-G, EF-2, etc.). - Ras family of GTP-binding proteins (Ras, Rho, Rab, Ral, YpU , SEC4, etc.). - Nuclear protein ran. 
ss - ADP-ribosylation factors family - Bacterial dnaA protein - Bacterial recA protein - Bacterial recF protein - Guanine 
nucleotide-binding proteins alpha subunits (Gi, Gs, Gt, GO, etc.). - DNA mismatch repair proteins mutS family - Bacterial 
type II secretion system protein E. Not all ATP- or GTP-binding proteins are picked-up by this motif. A number of 
proteins escape detection because the structure of their ATP-binding site is complet ly different from that of the P- 
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loop. Examples of such proteins are the E1-E2 ATPases or the glycolytic kinases. In other ATP- or GTP-binding proteins 
the flexible loop exists in a slightly different form; this is the case for tubulins or protein kinases. A special mention must 
b reserved foradenylate kinase, in which there is a single deviation from th P-loop pattern: in the last position Gly is 
found instead of Ser or Thr. 
5 Cons nsus pattern: [AG]-x(4)-G-K-[ST) 

In addition to the proteins listed above, the 'A' motif is also found in a number of other proteins. Most of these proteins 
probably bind a nucleotide, but others are definitively not ATP- or GTP-binding (as for example chymotrypsin, or human 
f rritin light chain). 

[ 1] Walker J. E... Saraste M., Runswick M.J., Gay N.J. EMBO J, 1:945-951 (1982), [ 2) Moller W., Amons R. FEBS Lett. 
io 186:1-7(1985). [ 3) Fry D.C., Kuby S.A., Mildvan A.S. Proc. Natl. Acad. Sci. U.S.A. 83:907-911 (1986).[ 4] Dever T.E., 

Glynias M.J., Merrick W.C. Proc. Natl. Acad. Sci. U.S.A. 84:181 4-1818(1 987). [ 5] Saraste M., Sibbald PR., Wittinghofer 

A. Trends Biochem. Sci. 15:430-434(1 990). [ 6] Koonin E.V. J. Mol. Biol. 229:1 165-11 74(1 993). [ 7] Higgins C.F., Hyde 

S.C., Mimmack M.M., Gileadi U., Gill D.R., Gallagher M.P. J. Bioenerg. Biomembr. 22:571 -592(1 990).[8] Hodgmah T. 

C. Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata).[ 9] Linder R, Lasko P., Ashburner M., Leroy P., 
is Nielsen P.J., Nishi K., Schnier J., Slonimski PP. Nature 337:121-122(1 989).[10] Gorbalenya A.E., Koonin E.V., 

Donchenko A.P, Blinov VM. Nucleic Acids Res. 17:4713-4730(1989), 

[1431] GTP-binding nuclear protein ran signature (ras) 

Ran (or TC4) is a small abundant nuclear protein that binds and hydrolyzes GTP and which has been implicated in a 
large number of processes including nucleocytoplasmic transport, RNA synthesis, processing and export and cell cycle 

20 checkpoint control [1,2]. Ran is generally included in the RAS 'superfamily' of small GTP-binding proteins [3], but it is 
only slightly related to the other RAS proteins. It also differs from RAS proteins in that it lacks cysteine residues at its 
C- terminal and is therefore not subject to prenylation. Instead ran has an acidic C-terminus. It is, however similar to 
RAS family members in requiring a specific guanine nucleotide exchange factor (GEF) and a specific GTPase activating 
protein (GAP) as stimulators of overall GTPase activity. The region of the GTP-binding B motif which, in ran, is perfectly 

2S conserved has been selected as a signature pattern. Consensus pattern: D-T-A-G-Q-E-K-[LF]-G-G-L-R-[DE]-G-Y-Y- 
Proteins belonging to this family also contain a copy of the ATP/GTP- binding motif 'A' (P-loop). 
( 1)ScheffzekK., KlebeC., Fritz-Wolf K., Kabsch W., Wittinghofer A. Nature 374:378-381 (1 995). [ 2] Rush M.G., Drivas 
G., d'Eustachio P. BioEssays 18:103-112(1996). [ 3] Valencia A., Chardin P., Wittinghofer A, Sander C. Biochemistry 
30:4637-4648(1991). 

30 [1432] 574. recA signature 

The bacterial recA protein [1,2,3,jE1] is essential for homologous recombination and recombinational repair of DNA 
damage. RecA has many activities: it filaments, it binds to single- and double-stranded DNA, itbinds and hydrolyzes 
ATP, it is also a recombinase and, finally, it interacts with lexA causing its activation and leading to its autocatalytic 
cleavage. RecA is a protein of about 350 amino-acid residues. Its sequence is very well conserved [3,4 ( 5,E1J among 

35 eubacterial species. It is also found in the chloroplast of plants [6]. The best conserved region, a nonapeptide located 
in the middle of the sequence which is part of the monomer-monomer interlace in a recA filament has been selected 
as a signature pattern,. 

Consensus pattern: A-L-[KR]-[IF]-[FY]-[STA]-[STAD]-(LIVMQ]-R- 

[ 1] Smith K.C., Wang T.-C. V. BioEssays 10:12-16(1989).[ 2] Lloyd A.T., Sharp P.M. J. Mol. Evol. 37:399-407(1993). 
40 [ 3] Roca A.I., Cox M.M. Prog. Nucleic Acids Res. Mol. Biol. 56:129-223(1 997). [ 4] Karlin S., Weinstock G.M., Brendel 
V. J. Bacteriol. 1 77:6881 -6893(1 995).[ 5] Eisen J.A. J. Mol. Evol. 41:1105-1123(1 995).[ 6] Cerutti H.D., Osman M., 
Grandoni R, Jagendorf AT. Proc. Natl. Acad. Sci. U.S.A. 89:8068-8072(1 992UE11 http://www.tiqr.org/~jeisen/RecA/ 
RecA.html 

[1433] 575. Response regulator receiver domain 
45 This domain receives the signal from the sensor partner inComment: bacterial two-cormponont systems. It Is usually 
found N-termihalComment: to a DNA binding effector domain. 
[1] PaoGM, Saier MH; J Mol Evol 1995;40:136-154. 
[1434] 576. Ribonucleotide reductase large subunit signature 

•Ribonucleotide reductase (EC 1.17.4.1 ) [1,2] catalyzes the reductive synthesis of deoxyribonucleotides from their 
50 corresponding ribonucleotides. It provides the precursors necessary for DNA synthesis. Ribonucleotide reductase is 
an oligomeric enzyme composed of a largo subunit (700 to 1000 rosidues) and a small subunit (300 to 400 rosiduos). 
There are regions of similarities in the sequence of the large chain from prokaryotes, eukaryotes and viruses. One of 
these regions has been selected as a signature. pattern. 

[1435] Consensus pattern: W-x(2)-[LF]-x(6,7)-G-[LIVM]-[FYRAHNH]-x(3)-[STAQLIVM]-[ASC]-x(2)-[PA]- 
55 [1] Nillson O., Lundqvist T. f Hahne S., Sjoberg B.-M. Biochem. Soc. Trans. 16:91-94(1 988). f 2] Reichard P. Science 
260:1773-1777(1993). I 
[1436] 577. Ribonuclease T2 family histidine active sites 

The fungal rlbonucleases T2 from Aspergillus oryzae, M from Aspergillus ealloifirid Rh from Rhl2Op0U§ fliVQUS «f§ 
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structurally and functionally related 30 Kdglycoproteins [1] that cleave the 3"-5" internucleotide linkage of RNA v.a a 
nucrootido 2',3'-cyclic phosphate intermediates (EC aiSLD-A number of other RNAses have been found I to be evo- 
Tu onary related to these fungal enzymes: - Self-incompatibi.ify [2] in flowering plants .s often controlled by a single 
gone (S gone) that has several alleles. This gene prevents fertilization by self-pollen orby pollen bearing either of the 

s Two S- alfolos expressed in (ho style. The self-incompatibility glycoprotein from several h.gher plants of the solanaceae 
family has been shown [2,3] to be a ribonuclease. - Phosphate-starvation induced RNAses LE and LX from tomato 
141 Those two enzymes are probably involved in a phosphate-starvation rescue system. - Escherichia coli penplasmic 
RNAso I (EC 3 1 276) (gene rna) [5]. - Aeromonas hydrophila periplasmic RNAse. - Haemophilus mfluenzae hypo- 
thetical proteirTHioiie.Two histidines residues have been shown [6,7] to be involved in the calalytic mechanism of 

10 RNase T2 and Rh. These residues and the region around them arehighly conserved in all the sequence descr.bed 
above. Two signature patterns have been developed, one for each ol the two active-site hist.dmes. The second pattern 
also contains a cysteine which is known to be involved in a disulfide bond. 
Consensus pattern: |FYWL]-x-[LIVM]-H-G-L-W-P [H is an active site residue] 

Consensus pattern: |LIVMF]-x(2)-[HDGTY]-|EQ]-|FYW]-x-[KR]-H-G-x-C [H .s an active site res.due] [C is involved .n 

' S fSanabTH 1 , Naitoh A., Suyama Y, Inokuchi N., Shimada H., Koyama T.. Ohgi K., Irie M. J. Biochem • ICJSOMIO 
21Harinq V Gray J E . McClure B.A., Anderson M.A., Clarke A.E. Science 250:937-941(1990).] 3] McClure 
B A Hafilg V ' ait PR, Anderson M.A.. Simpson R.J.. Sakiyama R. Clarke A.E. Nature 342:95957(1989)^ 4] Lo- 
elllor A Glund K Irie M. Eur. J. Biochem. 214:627-633(1993).] 5] Meador J. Ill, Kennell D. Gene 95:1-7(1990^ 6 

20 KawataV Sakiyama R, Hayashi R. Kyogoku Y. Eur. J. Biochem. 187:255-262(1 990). ] 7] Kurihara H.. M„su, Y, Ohg, 
K., Irie M.. Mizuno H., Nakamura K.T. FEBS Lett. 306:189-192(1992). 

1 14371 r,7n RILominlnollcJo roducKmo If.irflo Bubunil sifjm.turo. Rtoonuclootido reductase (EC 1.17.4.1 ) p ,d\ caiaiyz 
os the reductive synthesis of dooxy.ibonucloolidos Irom Ihoir co, responding ribonucleotides. II provides the P'°C" rsore 
necessary lor DNA synthesis. Ribonucleotide reductase is an oligomeric enzyme composed of a large subunit (700 to 
25 1000 residues) and a small subunit (300 to 400 residues). There are regions of similarities in the sequence of the large 
chain from prokaryotes, eukaryotes and viruses. One of these regions has been developed as a signature pattern. 
1 4 T 8 Consensus pauern: X-x(2)- l LF]-x(6,7)-G-[L.VM]-,FYRA]-[NH 

[1439] [ 1] Nillson O., Lundqvist T, Hahne S., Sjoborg B.-M. Biochem. Soc. Trans. 16:91-94(1988)-! 2] Reichard P. 
Science 260:1773-1777(1993). 

30 [14401 579. RNase H .... . * 

RNase H digests the RNA strand of an RNA/DNA hybrid. Important enzyme in retrov.ral re plica *™ c Y cie -™ 6 . ' °»° n 
found as a domain associated with reverse transcriptases. Structure is a mixed alpha + beta fold wrth three a/b/a layers. 
114411 58C. Eukaryolic putative RNA-binding region RNP-1 signature (rrm) . « 

Many eukaryotic proteins that are known or supposed to bind single-strandedRNA conta.n one or more copies of a 

35 putatrve RNA-binding domain of about 90amino acids [1,2]. This region has been found ,n the .^towing I 

Heterogeneous nuclear ribonucleoproteins » - hnRNP A1 (helix destab.l.zmg prote.n) (tw.ee). - hnRNP A2/B1 (twiceV 
- hnRNP C (C1/C2) (once). - hnRNP E (UP2) (at least once). - hnRNP G (once). " Small nuclear ribonucleopro ems 
» U1 snRNP 70 Kd (once). - U1 snRNP A (once). - U2 snRNP B" (once). - Pre-RNA and mRNA assoc.ated prote.ns 
- Protein synthesis initiation factor 4B (e.F-4B) [3], a protein essentia, for the binding of mRNA to 

40 - Nucleolin (4 times). - Yeast single-stranded nucleic acid-binding protein (gene SSB1) (once). - Yeast P o e n NSR1 
(twice) NSR1 is involved in pre-rRNA processing; it specifically binds nuclear localizal.on sequences. - Poly( A) binding 
2 (PABP) (4 times). " Others - - Drosophila sex determination protein Sex-lethal (Sxl) (twice). - Drosoph ,1a sex 
determination protein Trans.ormer-2 (Tra-2) (once). - Drosophila 'e.aV protein (3 ^^^^^^ 
the RNA metabolism of neurons. - Human paraneoplastic encepha.omyelit,s amigen HuD (3 times > 4 ^whu* is highly 

<s similar to elav and which may play a role in neuron-specific RNA processing. - Drosophila bico d prote.n (once] , [5] ^a 
segment-polarity homeobox protein that may also bind to specific mRNAs. - La antigen (once) a prote.n which , may 
pla'y a role' in the" transcription of RNA polymerase III. - The 60 Kd Ro protein (once), a ^ e ^^Z^ 
A maize protein induced by abscisic acid in response to water stress, which seems to be a RNA-bind.ng protein. - 
Three tobacco proteins, located in the chloroplast [6], which may be involved in splicing and/or processing of cWocoptart 

so RNAs (twice). - X1 6 [7]. a mammalian protein which may be involved in RNA processing in relat.on with cellutar pro- 
liferation anrior maturation. - Insulin-induced growth response protein CI-4 from rat (twice) - 

TIAR (3 times) [8] which possesses nucleolytic activity against cytotoxic lymphocyte target cells , may be < -™olved in 
apoptosis. - Yeast RNA15 protein, which plays a role in mRNA stability and/or poly-(A) tail length (9].lns.de the putatrve 
RNA-binding domain there are two regions which are highly conserved. The first one is a hydrophobic segment of six 
55 residues (which is called the RNP-2 motif), the second one is an octapeptide motif (which .s called RNP-1 or RNP- 
CS) The position of both motifs in the domain is shown in the following schematic representation: 
[1442] xxxxxxx######xxxxxxxxxxxxxxxxxxxxxxxxxxxxx#####«##xxxxxxxxxxxxxxxxxxxxxxxx x RNP-2 RNP-1 
The RNP-1 motif has been used as a signature pattern for this type of domain. 
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Consensus pattern: [RK]-G-{EDRKHPCG}-[AGSCI]-[FY]-[LIVA]-x-[FYLM] In most cases the residue in position 3 of 
the pattern is either Tyr or Phe. 

[ 1] Bandziulis R.J., Swanson M.S., Dreyfuss G. Genes Dev. 3:431 -437(1 989). [ 2].Dreyfuss G., Swanson M.S., Pinol- 
Roma S. Trends Biochern. Sci. 13:86-91(1 988).[ 3] Milburn S.C., Hershoy J.W.B., Davios M.V., KellehGr K., Kaufman 
R.J. EMBO J. 9:2783-2790(1 990).[ 4] Szabo A., Dalmau J. ( Manley G. ( Rosenfeld M., Wong E., Henson J.., Posner J. 
B., Furneaux H.M. Cell 67 : 325- 333(1 99 D.f 51 Rebagliati M. Cell 58:231 -232(1 989). f6J Li Y., Sugiura M. EMBO J. 9: 
3059-3066(1 990). [7] Ayane M., Preuss U. ( Koehler G.. Nielsen P.J. Nucleic Acids Res. 19:1273-1278(1991).[ 8] 
Kawakami A., Tian Q., Duan X., Streuli M., Schlossman S.F., Anderson P. Proc. Natl. Acad. Sci. U.S.A. 89:8681-8685 
(1992).[ 9] Minvielle-Sebastia L. ( Winsor B., Bonneaud N., Lacroute F. Mol. Cell. Biol. 11:3075-3087(1991). 
[1443] 581 . Rubredoxin signature 

[1444] Rubredoxins [1] are small electron-transfer prokaryotic proteins; They contain an iron atom which is ligated 
by four cysteine residues. Rubredoxins are, in some cases, functionally interchangeable with ferredoxins. 
[1445] A conserved region that includes two of the cysteine residues that bind the iron atom has been selected as 
a pattern for these proteins. 

[1446] Consensus pattern: [LIVM]-x(3)-W-x-C-P-x-C-[AGD] [The two C's bind the iron atom] 

In Pseudomonas oleovorans rubredoxin 2 (gene alkG) [2], this pattern is found twice because alkG has two rubredoxin 

domains. ' , 

Rubrerythrin [3], a protein with inorganic pyrophosphatase activity from Desulfovibrio vulgaris possesses a C-torminal 

rubredoxin-like domain, but this domain is too divergent to be detected by the above pattern. 

[1447] [ 1] Berg J.M., Holm R.H.(ln) Iron-sulfur proteins, Spiro T.G., Ed., pp1-66, Wiley, New- York, (1982). [ 2] Kok 
M., Oldenhuis R., der Linden M.P.G., Meulenberg C.H.C., Kingma J., Witholt B., J. Biol. Chom. 264:5442-5451(1989). 
[3] van Beeumen J. J., van Driessche G., Liu M.-Y, Le Gall J., J. Biol. Chem. 266:20645-20653(1991). 
[1448] 582. (rvp) Eukaryotic and viral aspartyl proteases active site 

Aspartyl proteases, also known as acid proteases, (EC 3,4.23.-) are a widely distributed family of proteolytic enzymes 
[1,2,3] known to exist invertebrates, fungi, plants, retroviruses and some plant viruses. Aspartate proteases of eukary- 
otes are monomeric enzymes which consist of two domains. Each domain contains an active site centered on a catalytic 
aspartyl residue. The two domains most probably ovolvod Irom tho duplication of an ancestral gone encoding a pri- 
mordial domain. Currently known eukaryotic aspartyl proteases are: - Vertebrate gastric pepsins A and C (also known 
as gastricsin). - Vertebrate chymosin (rennin), involved in digestion and used for making cheese. - Vertebrate lysosomal 
cathepsins D (EC 3.4.23.5 ) and E (EC 3.4.23.34 ). - Mammalian renin (EC 3.4.23.15) whose function is to generate 
angiotensin I from angiotensinogen in the plasma. - Fungal proteases such as aspergillopepsin A (EC 3.4.23.18 ). 
candidapepsin (EC 3.4.23.24 ). mucoropopsin (EC 3.4.23.23 ) (mucor ronnin), ondothiapopsin (EC 3.4.23.22 ). polypo- 
ropepsin (EC 3.4.23.29 ). and rhizopuspopsin (EC 3.4.23.21 ). - Yoast saccharopopsin (EC 3.4.23.25 ) (proleinaso A) 
(gene PEP4). PEP4 is implicated in posttransfational regulation of vacuolar hydrolases, - Yeast barrier pepsin (EC 
3.4.23.35 ) (gene BAR1 ); a protease that cleaves alpha-factor and thus acts as an antagonist of the mating pheromone. 
- Fission yeast sxal which is involved in degrading or processing the mating pheromones. Most retroviruses and some 
plant viruses, such as badnaviruses, encode for anaspartyl protease which is an homodimer of a chain of about 95 to 
125 amino acids. In most rotrovlrusos, Iho protonso Is oncodod aa a oogrnonl of a polyprotoln which In clonvod durlnrj 
the maturation process of the virus. It is gonerally part of the pol polyprotein and, more rarely, of the gagpolyproteln. 
Conservation of the sequence around the two aspartates of eukaryotic aspartyl proteases and around the single active 
site of the viral proteases allows us to develop a single signature pattern for both groups of protease. 
Consensus pattern: [LIVMFGAC]-[LIVMTADN]-[LIVFSA]-D-[ST]-G-[STAV]-[STAPDENQ]- x-[LIVMFSTNC]-x-[LIVMF- 
GTA] [D is the active site residue] - [ 1] Foltmann B. Essays Biochern. 17:52-84(1981). [ 2] Davies D.R. Annu. Rev. 
Biophys. Chem. 19:189-215(1990). [ 3] Rao J.K.M., Erickson J.W., Wlodawer A. Biochemistry 30:4663-4671 (1 991). [ 4] 
Rawlings N.D., Barrett A.J. Meth. Enzymoi. 248:105-120(1995). 
[1449] 583. (rvt) Reverse transcriptase (RNA-dependent DNA polymerase) 

A reverse transcriptase gene is usually indicative of a mobile element such as a retrotransposon or retrovirus. Reverse 
transcriptases occur in a variety of mobile elements, including retrotransposons, retroviruses, group II introns, bacterial 
msDNAs, hepadnaviruses, and caulimoviruses. Number of members: 1233 

[1450] [1 ] Medline: 91006031 . Origin and evolution of retroelements based upon their reverse transcriptase sequenc- 
es. Xiong Y, Eickbush TH; EMBO J 1990;9:3353-3362. 
[1451] 584. (S-AdoMet synt) S-adenosylmethionine synthetase signatures 

S-adenosylmethionine synthetase (EC 2.5.1.6 ) is the enzyme that catalyzes thoformation of S-adonosylmothionino 
(AdoMet) from methionine and ATP [1]. AdoMet is an important methyl donor for transmethylation and is also the 
propylamino donor in polyamine biosynthesis. In bacteria there is a single isoform of AdoMet synthetase (gene metK), 
there are two in budding yeast (gonos SAM1 and SAM2) and in mammals while In plants Ihoro.la gonurally a multlgbno 
familyThe sequence of AdoMet synthetase is highly conserved throughout isozymes and species, Twp signature pat- 
terns have been selected for this type of enzyme; the first Is a hexapeptlde which seems to be Involved in ATP-blndlng; 
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the second is an almost perfectly conserved glycine-rich nonapeptid . 

Consensus pattern: G-A-G-D-Q-G-x(3)-G-[FYH]-Sequences known to belong to this class detected by the patt rn: 
Consensus pattern: G-[GA]-G-[ASC]-F-S-x-K-[DE] , 

| 1] Horikawa S. f Sasuga J., Shimizu K., Ozasa H.. Tsukada K. J. Biol. Chem. 265:13683-13686(1990). 
6 11462] 505. S1 RNA binding domain 

Tho S1 domain occurs in a wide range of RNACommonl: associated proteins. It is structurally similarComment: to cold 
shock protein which binds nucleic acids. Comment: The S1 domain has an OB-fold structure. 
[1] Bycroft M, Hubbard TJ, Proctor M, Freund SM, Murzin AG; Cell 1997;88:235-242. 
[1453] 586. SAICAR synthetase signatures 
w Phosphoribosylaminoimidazole-succinocarboxamide synthase (EC 6.3.2.6 ) (SAICARsynthetase) catalyzes the sev- 
enth slop in tho do novo purino biosynthetic pathway; tho ATP-dependent conversion of S'-phosphoribosyl-S-aminoim- 
idazole-4-carboxylic acid and asparlic acid to SAICAR [1]. In bacteria (gene purC),1ungi (gene ADE1) and plants, 
SAICAR synthetase is a monofunctional protein;in higher vertebrates it is the N-terminal domain of a Afunctional en- 
zyme that also catalyze phosphoribosylaminoimidazole carboxylase (AIRC) activity. Two conserved regions in the 
is central section of this enzyme have been selected as signature patterns for SAICAR synthetase. 
Consonsus pattorn: [LIVMF](2)-P-[LlVM]-E-x-[LIVM]-[LIVMCA]-R-x(3)-[TA]-G-S- 
Consonsuo pallorn: [LIVM]-|LIVMA]-D-x-K-[LIVMFY]-E-F-G 
[ 1] Zalkin H., Dixon J.E. Prog. Nucleic Acid Res. Mol. Biol. 42:259-287(1992). 
[1454] 587. (SCP) Extracellular proteins SCP/Tpx-1/Ag5/PR-1/Sc7 signatures 
20 A variety of extracellular proteins from eukaryotes have been found to be evolutionary related: - Rodent sperm-coating 
glycoprotein (SCP), also known as acidic epididymal glycoprotein (AEG) . This protein is thought to be involved in 
sperm maturation [1]. It is a protein of about 220 residues and probably contains eight disulfide bonds. - Mammalian 
testis-specific protein Tpx-1 [2]. Tpx-1 is highly related to SCP's. - Mammalian glioma pathogenesis-related protein 
(GliPR). - Lizard helothermine, a toxin that blocks ryanodine receptors. - Venom allergen 5 (Ag5) Irom vespid wasps 
25 and venom allergen 3 (Ag3) from fire ants. These proteins are potent allergens and are the main cause of allergic 
reactions to stings from insects of the hymenoptera family [3]. Ag5/3 are proteins of about 200 residues and contain 
four disulfide bonds. - Plant pathogenesis proteins of the PR-1 family [4]. These proteins are synthesized during path- 
ogen infection or other stress-rotated responses. They are proteins of about 130 to 140 residues and probably contain 
three disulfide bonds. - Proteins Sc7 and Sc14 from the basidomycete fungus Schizophyllum commune. These extra- 
30 cellular proteins are loosely associated with fruit body hyphal walls [5]. Sc7/1 4 are proteins of about 1 80 residues and 
probably contain two disulfide bonds. - Ancylostoma secreted protein from dog hookworm. - Yeast hypothetical proteins 
YJL078c, YJL079C and YKR01 3w.The exact function of these proteins is not yet known. Two conserved regions located 
in their C-terminal half have been selected as signature patterns. The second signature contains a cysteine which is 
known to bo involved in a disulfide bond in Ag5. 
35 Consensus pattern: [GDER]-H-[FYWH]-T-Q-[LIVM](2)-W-x(2)-[STN] 

Consensus pattern: [LIVMFYH]-[LIVMFY]-x-C-[NQRHS]-Y-x-[PARH]-x-[GL]-N-[LIVMFYWDN] [C is involved in a di- 
sulfide bond] " 
[ 1] Mizuki N , Kasahara M. Mol. Cell. Endocrinol. 89:25-32(1 992).[ 2] Kasahara M. t Gutknecht J., Brew K., Spurr N., 
Goodfellow PN Genomics 5:527-534(1 989).[ 3] Lu G., Villalba M., Coscia M.R., Hoffman D.R., King T.P. J. Immunol. 
40 1 50:2823-2830(1 993).[ 4] Dixon D.C., Cutt J.R., Klessig D.F. EMBO J. 10:1317-1324(1991).[ 5] Schuren F.H.J., As- 
geirsdottir S.A., Kothe E.M., Scheer J.M.J., Wessels J.G.H. J. Gen. Microbiol. 139:2083-2090(1993). 
[1455] 588. SET domain 

SET domains appear to be protein-protein interactionComment: domains. It has been demonstrated that SET do- 
mainsComment: mediate interactions with a family of proteins thatComment: display similarity with dualnspecificity 
45 phosphatasesComment: (dsPTPases) [2]. 

[1 ] Tripoulas N, LaJeunesse D, Gildea J. Shearn A; Genetics 1 996;1 43:91 3-928. [2] Cui X, De Vivo I, Slany R, Miyamoto 
A, Firestein R t Cleary, ML; Nat Genet 1998;18:331-337. 
[1456] 589. Src homology 3 (SH3) domain profile 

The Src homology 3 (SH3) domain is a small protein domain of about 60 amino-acid residues first identified as a 
50 conserved sequence in the non-catalytic part of several cytoplasmic protein tyrosine kinases (e.g. Src, Abl, Lck) [1]. 
Since then, it has been found in a great variety of other intracellular or membrane-associated proteins [2,3,4,5].The 
SH3 domain has a characteristic fold which consists of five or six beta-strands arranged as two tightly packed anti- 
parallel beta sheets. The linker regions may contain short helices [6].The function of the SH3 domain is not well un- 
derstood. The current opinion is that they mediate assembly of specific protein complexes via binding to proline-nch 
55 peptides [7].ln general SH3 domains are found as single copies in a given protein, but there is a significant number of 
protein with two SH3 domains and a few with 3 or 4 copies. So far, SH3 domains have been identified in the following 
proteins: - Many vertebrate, invertebrate and retroviral cytoplasmic (non-receptor) protein tyrosine kinas s. In particular 
in the Src, Abl, Bkt, Csk and ZAP70 families of kinases. - Mammatian phosphatidylinositol-specific phospholipase C- 
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gamma-1 and -2. - Mammalian phosphatidyl inositol 3-kinase regulatory ' p85 subunit. - Mammalian Ras GTPase- 
activating protein (GAP). - Adaptor proteins mediating binding of guanine.nucleotide exchange factors to growth factor 
receptors: vertebrate GRB2, Caenorhabditis elegans sem-5 and Drosophila DRK. All of which have two SH3 domains. 

- Mammalian Vav oncoprotein, a guanine nucleotide exchange factor of the CDC24 family. - Some guanine-nucleotide 
s releasing factors of the CDC25 family: yeast CDC25, yeast SCD25, fission yeast ste6. MAGUK proteins. These 

proteins consist of at least three types of domains: one or more copies of the DHR domain, a SH3 domain and a C- 
terminal guanylate kinase domain. Members of this family are: Drosophila lethal(1)discs large-1 tumor suppressor 
protein (gene Dlg1 ), mammalian tight junction protein ZO-1 , vertebrate erythrocyte membrane protein p55, Caenorhab- 
ditis elegans protein lin-2, rat protein CASK and mammalian synaptic proteins SAP90/PSD-95, CHAPSYN-110/PSD- 
io 93, SAP97/DLG1 and SAP102. - Miscellanous proteins interacting with vertebrate receptor protein tyrosine kinases: 
mammalian cytoplasmic protein Nek (3 copies), oncoprotein Crk (2 copies). - Chicken Src substrate p80/85 protein 
(cortactin) and the similar human hemopoietic lineage cell specific protein Hs1 . - Mammalian dihydroiiridine-sensitive 
L-type calcium channel beta (regulatory) subunit including the related human myasthenic syndrome antigen B (MSYB). 

- Mammalian neutrophil cytosolic activators of NADPH oxidase: p47 (NCF-1), p67 (NCF-2), and a potential homolog 
ts from Caenorhabditis elegans (B0303.7). NCF-1 and -2 have two copies of the SH3 domain, while B0303.7 has four - 

Some myosin heavy chains from amoebae, slime molds and yeast (gene MY03). - Vertebrate and Drosophila spectrin 
and fodrin alpha-chain.-- Human amphiphysin. - Yeast actin-binding protein ABP1. - Yeast actin-binding protein SLA1 
(3 copies). - Yeast protein BEM1 and the fission yeast homolog scd2 (or ral3) (2 copies). - Yeast BEM1 -binding proteins 
BOI2 (BEB1) and BOB1 (BOM). - Yeast fusion protein FUS1 . - Yeast protein RSV167. - Yeast protein SSUB1. - Yeast 
20 hypothetical proteins YAR014c (1 copy), YFR024C (1 copy), YHL002w (1 copy), YHR016c (1 copy), YJL020C (1 copy), 
YHR114W (2 copies) and the fission yeast homolog SpAC12C2.05c. - Caenorhabditis elegans hypothetical proteins 
F42H10.3. The profile developed to detect SH3 domains is based on a structural alignment consisting of 5 gap-froo 
blocks and 4 linker regions totaling 62 match positions. 

[ 1] Mayer B.J., Hamaguchi M., Hanafusa H. Nature 332:272-275(1 988). [ 2] Musacchio A., Gibson T., Lehto V.R, Sar- 
25 aste M. FEBS Lett. 307:55-61(1 992). [ 3] Pawson T., Schlessinger J. Curr. Biol, 3:434-442(1 993).[ 4] Mayer B.J., Bal- 
timore D. Trends Cell Biol. 3:8-1 3(1 993). [ 5] Pawson T Nature 373:573-580(1 995). [ 6] Kuriyan J., Cowburn D. Curr. 
Opin. Struct. Biol. 3:828-837(1 993). [ 7] Morton C.J., Campbell I.D. Curr. Biol. 4:615-617(1994). 
[1457] 590. Serine hydroxymethyltransferase pyrtdoxal-phosphate attachment site (SHMT) 

Serine hydroxymethyltransferase (EC 2.1.2.1 ) (SHMT) [1] catalyzes the transfer of the hydroxymethyl group of serine 
30 to t trahydrofolate to form 5,10-methylenetetrahydrofolate and glycine. In vertebrates, it exists in acytoplasmic and a 
mitochondrial form whereas only one form is found in prokaryotes. Serine hydroxymethyltransferase is a pyridoxal- 
phosphate containing enzyme. The pyridoxal-P group is attached to a lysine residue around which the sequence is 
highly conserved in all forms ot the enzyme. 

Consensus pattern: [DEH]-ILIVMFY]-x-[STMV]-[GST]-[ST](2)-H-K-[ST]-[LF]-x-G-[PAC]-[RQ]-[GSA]-[GA] [K is the py- 
35 ridoxal-P attachment site] 

[ 1] Usha R., Savithri H.S., Rao N.A. Biochim. Biophys. Acta 1204:75-83(1994). 
[1458] 591. SIS domain 

SIS (Sugar ISomerase) domains are found in many phosphosugar isomorases and phosphosugar binding proteins. 
[1] Teplyakov A, Obmolova G, Badet-Denisot MA, Badet B, Polikarpov I; Structure 1998;6:1047-1055. 

to [1459] 592. (SKI) Shikimate kinase signature 

Shikimale kinase (EC 2.7.1.71 ) catalyzes the fifth step in the biosynthesis from chorismate of the aromatic amino acids 
(the shikimate pathway) inbacteria (gene aroK or aroL), plants and in fungi (where it is part of a multifunctional enzyme 
which catalyzes five consecutive steps in this pathway). Shikimate kinase is a small protein of about 200 residues. A 
conserved region that contains a run of three glycines has been selected as a signature pattern. 

45 Consensus pattern: [KR]-x(2)-E-x(3)-[LIVMF]-x(8,12)-[LIVMF](2)-[SA]-x-G(3)- x-[LlVMF]. Proteins belonging to this 
family also contain a copy of the ATP/GTP- binding motif 'A' (P-loop). 
[1460] 593. SNAP-25 family 

[1461] SNAP-25 (synaptosome-associated protein 25 kDa) proteins are components of SNARE complexes. Mem- 
bers of this family contain a cluster of cysteine residues that can be palmitoylated for membrane attachment [2]. 
so [1462] (1]Brennwald P, Koarns B, Champion K, Komnon S, Bankaltls V, Novick P; Coll 1994;79:245-250. [21 Rioinfjor 
C, Blomqvist AG, Lundell l ( Lambertsson A, Nassel D, Pieribone VA, Brodin L. Larhammar D; J Biol Chom 1993;268: 
24408-24414. 

[1463] 594. SNF2 and others N-terminal domain 

[1464] This domain is found in proteins involved in a variety of processes including transcription regulation (e.g., 
55 SNF2, STH1 , brahma, MOT1), DNA repair (e.g., ERCC6, RAD 16, RAD5), DNA recombination (e.g., RAD54), and 
chremntln unwlmjlna (e.g., I^WI) well n« n vmioly uf olhor pi«lr?irm with IHtlt* tumjtlonal lritormalit»n ImiIw^Ifii, 
ETL1). 

[1465] 595. Staphylococcal nuclease homologuos (Snaso) 


218 


EP 1 033 405 A2 

Present in all three domains of cellular life. Four copies in the transcriptional coactivator p1 00. Thes , however, appear 
to lack the active site residues of Staphylococcal nuclease. Positions 14 (Asp-21), 34 (Arg-35), 39 (Asp-40), 42 (Glu- 
43) andCpmment: 1 1 0 (Arg-87) [SNase numbering in parentheses] are thought to be involved in substrate-b.nd.ng and 

t. mPonltng CP; Proloin Sci 1997;6:459-463; |2] Cullobaut I, Mornon JP; Biochem J 1997;321:125-132. : , 
[1466] 596. SPRY domainA 

SPRY Domain is named from SPIa and the RYanodine Receptor. Domain of unknown function. Distant homologues 
are domains in Comment, butyrophilin/marenostrin/pyrin homologues. 
[1] Ponting C, Schultz J. Bork P; Trends Biochem Sci 1997;22:193-194. 

w [1467] 597 (SOS PSY) Squalene and phytoene synthases signatures 

Two dllloront polyisoprono synlhasos have boon shown [1 ,2,3) to share a number of regions of sequence similarities: 
- Squalene synthase (EC 2.5.1.21 ) (farnesyl-diphosphate farnesyltransferase) (SQS). which catalyzes the conversion 
ol two molecules of farnesy I diphosphate (FPP) into squalene. It is the first committed step in the cholesterol biosynthetic 
pathway The reaction carried out by SQS is catalyzed in two separate steps: the first is a head-to-head condensation 

T5 of the two molecules of FPP to form presqualene diphosphate; this intermediate is then rearranged in a NADP-de- 
pondont reduction, to form squalono. SQS is found in oukaryotos. In yeast it is encoded by the ERG9 gene, in mammals 
by ihu l-Drn jjuno. GQS ooomo to bo mombmno-bound, - Phyloono oynlhneo (EC 2.5.1.-) (PSY), which catalyzes 
the conversion ol two molecules of geranylgeranyl diphosphate (GGPP) into phytoene. It is the second slop in the 
biosynthesis of carotenoids from isopentenyl diphosphate. The reaction carried out by PSY is catalyzed in two separate 

so steps- the first is a head-lo-head condensation of the two molecules of GGPP to form prephytoene diphosphate; this 
intermediate is then rearranged to form phytoene. PSY is found in all organisms that synthesize caroteno.ds: plants 
and photosynlhotic bacteria as woll as some non- photosynthetic bacteria and fungi. In bacteria PSY is encoded by 
the gene crIB In plants PSY is localized in the chloroplast. As it can be seen from the description above, both SQS 
and PSY share a number of functional similarities which are also reflected at the level of their primary structure. In 

25 particular three well conserved regions are shared bySQS and PSY; they could be involved in substrate binding and/ 
or the catalytic mechanism. Signature patterns have been developed for the second and third conserved regions; they 
are localized in the central part of these enzymes. 

Consensus pattern: Y-[CSAM]-x(2)-[VSG]-A-[GSA]-[LIVAT]-[IV]-G-x(2)-[LMSC]- x(2)-[LIV] 
Consensus pattern: [LIVM]-G-x(3)-Q-x(2,3)-N-[IF]-x-R-D-[LIVMFY]-x(2)-[DE]- x(4,7)-R-x-[FY]-x-P- 
30 [ i] Summers C , Karst F., Charles A D. Gene 136: 185-1 92(1 993).[ 2] Robinson G.W., Tsay Y.H., Kienzle B.K., Smith- 
Monroy C.A., Bishop R.W. Mol. Cell. Biol. 1 3:2706-2727(1 993).[ 3] Roemer S., Hugueney P., Bouvier R. Camara B.. 
Kuntz M. Biochem. Biophys. Res. Commun. 196:1414-1421(1993). 
[1468] 598 SRP54-type proteins GTP-binding domain signature 

The signal recognition particle (SRP) is an oligomeric complex that mediates targeting and insertion of the signal 

as sequence of exported proteins into the membrane of the endoplasmic reticulum. SRP consists of a 7S RNA and six 
prolein subunits One of these subunits, the 54 Kd protein (SRP54). is a GTP-binding protein that interacts with the 
signal sequence when it emerges from the ribosome. The N-terminal 300 residues of SRP54 include the GTP-b.nding 
site (G-domain) and are evolutionary related to similar domains in other proteins which are listed below [1 ]. - Escherichia 
coli and Bacillus subtilis ffh protein (P48). a protein which seems to be the prokaryotic counterpart of SRP54. Ffh is 

40 associated with a 4.5S RNA in the prokaryotic SRP complex. - Signal recognition particle receptor alpha subunit (dock- 
ing protein) an integral membrane GTP-binding protein which ensures, in conjunction with SRP, the correct targeting 
of nascent secretory proteins to the endoplasmic reticulum membrane. The G-domain is located at the C-terminal 
extremity of the protein. - Bacterial ftsY prolein, a protein which is believed to play a similar role to that of the docking 
protein in eukaryotes. The G-domain is located at the C-terminal extremity of the protein. - The p.lA protein from Neis- 

46 seria gonorrhoeae which seems to be the homolog of ftsY. - A protein from the archaebacteria Sulfolobus solfatancus. 
This protein is also believed to be a docking protein. The G-domain is also at the C- terminus. - Bacterial flagellar 
biosynthesis protein flhF. The best conserved regions in those domains are the sequence motifs that are part of the 
GTP-binding site, but as those regions are nol specific to these proteins, they were not used as a signature pattern. 
Instead a conserved region located at the C-terminal end of the domain was selected. 

so Consensus pattern: P-(LIVM]-x-[FYL]-[LIVMAT]-[GS]-x-[GS]-[EQ)-x(4)-[LIVMF] [ 1 ] Althoff S.. Selinger D„ Wise J.A. 
Nucleic Acids Res. 22:1933-1947(1994). 

[1469] 599 (STphosphatase) Serine/threonine specific protein phosphatases signature 

Serine/threonine specific protein phosphatases (EC 3.1.3.16 ) (PP) [1.2,3] are enzymes that catalyze the removal of a 
phosphate group attached to a serine or evolutionary related. - Protein phosphatase-1 (PP1) is an enzyme of broad 
ss specificity It is inhibited by two thermostable proteins, inhibitor-1 and -2. In mammals, there are two closely related 
isoforms of PP-1 : PP-1 alpha and PP-1 beta, produced by alternative splicing of the same gene. In Emericella nidulans, 
PP-1 (gene bimG) plays an important role in mitosis control by reversing the action of the nimA kinase. In yeast. PP- 
1 (gene SIT4) is involved in dephosphorylating the large subunit of RNA polymerase II. - Protein phosphatase^ 
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(PP2A) is also an enzyme of broad specificity. PP2A is a trimeric enzyme that consist of a core composed of a catalytic 
subunit associated with a 65 Kd regulatory subunit and a third variable subunit. In mammals, there are two closely 
related isoforms of the catalytic subunit of PP2A: PP2A-alpha and PP2A-beta, encoded by s parate genes. - Protein 
phosphataso-2B (PP2B or calcinourin), a calcium-dc-pendent onzymo whoso activity is stimulatod by calmodulin. It is 

5 composed of two subunits: the catalytic A-subunlt and the calcium-binding B-subunit. The specificity of PP2B is re- 
stricted. In addition to the above-mentioned enzymes, some additional serine/threoninespecific protein phosphatases 
have been characterized and are listed below. - Mammalian phosphatase-X (PP-X), and Drosophila phosphatase-V 
(PP-V) which are closely related but yet distinct from PP2A. - Yeast phosphatase PPH3, which is similar to PP2A, but 
with different enzymatic properties. - Drosophila phosphatase-Y ; (PP-Y), and yeast phosphatases 21 and 22 (genes 

10 PP21 and PP22) which are closely related but yet distinct from PP1 . - Drosophila retinal degeneration protein C (gene 
rdgC), a calcium-binding phosphatase required to prevent light-induced retinal degeneration. - Phages Lambda and 
Phi-80 ORF-221 which have been shown to have phosphatase activity and are related to mammalian PP's. The best 
conserved regions in these proteins is a perfectly conserved pentapeptide that can be used as a signature pattern. 
Consensus pattern: [LIVMJ-R-G-N-H-E- 

15 [ 1] Cohen P. Annu. Rev. Biochem. 58:453-508(1 989). [ 2] Cohen P., Cohen P.T.W. J. Biol. Chem. 264:21435-21438 
(1989).[ 3] Cohen P.T.W., Brewis N.D., Hughes V., Mann D.J. FEBS Lett. 268:355-359(1990). 
[1470] 600. Translation initiation factor SUM signature 

In budding yeast (Saccharomyces cerevistae), SUM is a translation initiation factor that functions in concert with elF- 
2 and the initiator tRIMA-Met in directing the ribosome to the proper start site of translation [1]. SUM is a protein of 108 
20 residues. Close homologs of SUM have been found [2] in mammals, insects and plants. SUM is also evolutionary 
related to hypothetical proteins from Escherichia coli (yciH), Haemophilus influenzae (H1 1225) and Methanococcus 
vannielii. A conserved region in the C-terminal section has been selected as a signature pattern. 
Consensus pattern: [LIVM]-(EQ]-[LIVM]-Q-G-[DEN]-(KHQ]-[KRV] 

[ 1] Yoon H., Donahue T.F. Mol. Cell. Biol. 1 2:248-260(1 992).[ 2] Fields C.A., Adams M.D. Biochom. Biophys. Ros. 

25 Commun. 198:288-291(1994). 

[1471] 601. (S T dehydratase) Serine/threonine dehydratases pyridoxal-phosphate attachment site 
Serine and threonine dehydratases [1,2] are functionally and structurally related pyridoxal-phosphate dependent en- 
zymes: - L-serine dehydratase (EC 4.2.1.13 ) and D-serine dehydratase (EC 4.2.1.14 ) catalyze the dehydratation of L- 
serine (respectively D-serine) into ammonia and pyruvate. - Threonine dehydratase (EC 4.2.1 .16 ) (TDH) catalyzes the 

30 dehydratation of threonine into alpha-ketobutarate and ammonia. In Escherichia coli and other microorganisms, two 
classes of TDH are known to exist. One is involved in the biosynthesis of isoleucine, the other in hydroxamino acid 
catabolism. Threonine synthase (EC 4.2.99.2 ) is also a pyridoxal-phosphate enzyme, it catalyzes the transformation 
of homoserine-phosphate into threonine. It has been shown [3] that threonine synthase is distantly related to the serine/ 
threonine dehydratases. In all these enzymes, the pyridoxal-phosphate group is attached to a lysine residue. The 

35 sequence around this residue is sufficiently conserved to allow the derivation of a pattern specific to serine/threonine 
dehydratases and threonine synthases. 

Consensus pattern: [DESH]-x(4,5)-|STVG]-x-[AS]-[FYI]-K-[DLIFSA]-[RVMF]-[GA]-[LIVMGA] [The K is the pyridoxal-P 
attachment site] 

[ 1] Ogawa H., Gomi T, Konishi K., Date T, Naakashima H., Nose K., Matsuda Y, Peraino C, Pitot H.C., Fujioka M. 
40 j. Biol. Chem. 264:15818-1 5823(1 989). [ 2] Datta P., Goss T.J., Omnaas J.R., Patil R.V. Proc. Natl. Acad. Sci. U.S.A. 
84:393-397(1 987).[ 3] Parsot C. EMBOJ. 5:301 3-301 9(1 986).( 4]Grabowski R., Hofmeister A.E.M., BuckelW. Trends 
Biochem. Sci. 18:297-300(1993). 

[1472] Cysteine synthase/cystathionine beta-synthase P-phosphate attachment site 

Cysteine synthase (CSase) is the pyridoxal-phosphate dependent enzyme responsible [1] for the formation of cystotne 
45 from O-acetyl-serine and hydrogen sulfide with the concomitant release of acetic acid. In bacteria suchas Escherichia 
coli, two forms of the enzyme are known (genes cysK and cysM).ln plants there are also two forms, one located in the 
cytoplasm and the otherin chloroplasts. Cystathionine beta-synthase [2] catalyzes the first irreversiblestep in homo- 
cysteine transulfuration; the conjugation of homocysteine andserine forming cystathionine. Like Csase it is a pyridoxal- 
phosphate dependent enzyme. The two types of enzymes are evolutionary related. The pyridoxal-phosphategroup of 
50 CSases has been shown to be attached to a lysine rosiduo which Is locntod in the N-torminnl section of thoso onzymos; 
the sequence around this residue is highly conserved and can be used as a signature pattern to detect this class of 
enzymes. 

Consensus pattern: K-x-E-x(3)-[PAHSTAGC]-x-S-[IVAP]-K-x-R-x-[STAG]-x(2)-[LIVM] [The 2nd K is the pyridoxal-P 
attachment site 

55 [ 1] Saito K. ( Kurosawa M., Murakoshi I. FEBS Lett. 328:111-114(1993).[ 2] Swaroop M. ( Bradley K., Ohura T, Tahara 
T, Roper M.D. , Ros nberg L.E., Kraus J..R J, Biol. Chom, 267:11455-11461(1902). I 
[1473] 602. S locus glycop 

S-locu8 glycoprotein family. In Brassicaceae, self-incompatible plants have a eolf/non-solf Comment: recognition eye* 
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tern. This is sporophytically controlled by Comment: multiple alleles at a single locus (S). S-locus glycoproteins.Com- 
rnorit: as woll as S-rocoptor kinases, are in linkage with the S-alloles [1].Number of members: 128 
[11 Evolutionary aspects of the S-related genes of the Brassica self-incompatibility system: synonymous and nonsyn- 
oMyniuufi bntio tiuhtilitutiono, lllimln K, Wnl/infibo M ( Ynmnknwii S, SnUn Y, loogml A; Gonolico 1095;140:1099-1104. 
5 [2] Polymorphism of the S-locus glycoprotein gene (SLG) and the S-locus related gene (SLR1) in Raphanus sativus 
L.and self-incompatible ornamental plants in the Brassicaceae. Sakamoto K, Kusaba M ( Nishio T; Mol Gen Genet 
1998;258:397-403. 

[1474] 603. (sdh cyt) Succinate dehydrogenase cytochrome b subunit signatures 

Succinate dehydrogenase (SDH) is a membrane-bound complex of two main components: a membrane-extrinsic com- 
to ponont composod of an FAD-binding llavoprotoln and an iron-sulfur protoin, and a hydrophobic component composed 
of a cytochrome B and a membrane anchor protein. The cytochrome b component-, is a mono heme transmembrane 
protoin [1,2,3] belonging to a family that groups: - Cytochrome b-556 from bacterial SDH (gene sdhC). - Cytochrome 
b560 from the mammalian mitochondrial SDH complex. - Cytochrome b560 subunit encoded in the mitochondrial ge- 
nome of some algae and in the plant Marchantia polymorpha. - Cytochrome b from yeast mitochondrial SDH complex 
is (gene SDH3 or CYB3). - Protein cyt-1 from CaenorhabditisThese cytochromes are proteins of about 1 30 residues that 
comprise threelransmembrane regions. There are two conserved histidines which may beinvotved in binding the heme 
group. Two signature patterns have been developed that include these histidine residues. 

Consensus pattern: R-P-[LIVMT]-x(3)-[LI VM]-x(6)-[LIVMWPK]-x(4)-S-x(2)-H-R-x-[ST] [H could be a heme ligand) 
Conyoncus pallorn: H.x(3)-[GA]-|LIVMT].R-[HF]-|LIVMF].x-[FYWM)-D-x-[GVA] [H could bo a homo ligand] 
20 [ 1] Yu L, Wei Y.-Y., Usui S., Yu C.-A. J. Biol. Chem. 267:24508-2451 5(1 992).[ 2] Abraham P.R., Mulder A., Van't Riet 
J., Rauo H.A. Mol. Gon. Gonot. 242:708-71 6(1 994).[ 

3) Leblanc C, Boyen C. Richard O., Bonnard G., Grienenberger J.M., Kloareg B. J. Mol. Biol. 250:484-495(1995). 
[1475] 604. Seel family 

[1] Tho Sod family: a novel family of proteins involved in synaptic transmission and general secretion. Halachmi N. 
25 Lev Z; J Neurochem 1996;66:889-897. 
Number of members: 40 

[1476] 605. Protein secE/sec61 -gamma signature 

In bacteria, the secE protein plays a role in protein export; it is one of the components - with secY and secA - of the 
preprotein translocase. In eukaryotes, the evolutionary related protein sec61 -gamma playsa role in protein translocation 

30 through the endoplasmic reticulum; it is part of a trimeric complex that also consist of sec61 -alpha and beta [1]. Both 
secE and sec6l -gamma are small proteins of about 60 to 90 amino acids that contain a single transmembrane region 
at their C-terminal extremity (Escherichia colisecE is an exception, in that it possess an extra N-terminal segment of 
GOroGiduos thai contains two additional transmembrane domains).Tho sequence of secE/sec61 -gamma is not extreme- 
ly well conserved, however it is possible to derive a signature pattern centered on a conserved proline located 10 

35 residues before the beginning of the transmembrane domain. 

Consensus pattern: [LIVMFY]-x(2HDENQGA]-x(4HLIVMFTA]-x-[KRV]-x(2)-[KW]-P-x(3)-[SEQ]-x(7)-[LIVT]-[U 

[LIVFGAST] 

[ 1] Hartmann E., Sommor T, , Prohn S., Goorlich D., Jenlsch S., Rapoport T.A. Nature 367:654-657(1994). 
[1 477] 606. 1 1 -S plant seed storage proteins signature 
40 Plant seed storage proteins, whose principal function appears to be the major nitrogen source for the developing plant, 
can be classified, on the basis of their structure, into different families. 11-S are non-glycosylated proteins which form 
hexameric structures [1,2]. Each of the subunits in the hexamer is itself composed of an acidic and a basic chain 
derived from a single precursor and linked by a disulfide bond. This structure is shown in the following representation. 
+ + II 

45 xxxxxxxxxxxCxxxxxxxxxxxxxxxxxxxxxxNGxCxxxxxxxxxxxxxxxxxxxxxxx ********* < Acidic-subunit— >< 

Basic-subunit > < About-480-to-500-residues > , C: conserved cysteine involved in a di- 
sulfide bond.'* 1 : position of the pattern. Proteins that belong to the 11-S family are: pea and broad bean legumins, rape 
cruciferin, rice glutelins, cotton beta-globulins, soybean glycinins, pumpkin 11-S globulin, oat globulin, sunflower heli- 
anthinin G3, etc. The region that includes the conserved cleavage site between the acidic and basic subunits (Asn- 

50 Gly) and a proximal cysteine residue which is involved in the interchain disulfide bond have been used as a signature 
pattern for this family of proteins. ' 

Consensus pattern: N-G-x-[DE](2)-x-[LIVMF]-C-[ST]-x(11,12)-[PAG]-D [C is involved in a disulfide bond 
[ l]HayashiM., MoriH., NishimuraM., AkazawaT., Hara-Nishimura I. Eur. J. Biochem. 1 72:627-632(1 988).[ 2]Shotwell 
M.A., Atonso C„ Davies E., Chesnut R.S., Larkins B.A. Plant Physiol. 87:698-704(1988). 
55 [1478] 607. 7S seed storage protein 

[1479] 7S globulin is one of the main storage proteins of most angiosperms and gymnosperms. The 7S storage 
proteins are homotrimers. 
Number of members: 67 
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[1480] [1 ] The three-dimensional structure of canavalin from jack bean (Canavaliaensiformis). KoTR Ng JD, McPher- 
son A; Plant Physiol 1993;101:729-744. 

[1481] 608. Aspartate-semialdehyde dehydrogenase signature 

Aspartate-semialdehyde dehydrogenase (ASD) catalyzes the second step in the common biosynthetic pathway leading 
5 from Asp to diaminopimelate and Lys, to Met, and to Thr; the NADP-dependent reductive dephosphprylation of L- • 
aspartyl phosphate to L-aspartate-semia!dehyde. In bacteria and fungi, ASD is a protein of about 40 Kd (340 to 370 - 
residues) whose sequence is not extremely well conserved [1]. A conserved cysteine residue has been implicated as 
important for the catalytic activity [2].The region of conservation around the active site residue is too small to be used 
as signature pattern. Another more conserved region, located in the last third of the sequence, and which contains 
io both a conserved cysteine as well as an histidine has been used instead. 

Consensus pattern: [LIVM]-[SADN]-x(2)-C-x-R-[LIVM]-x(4)-[GSC]-H-[STA ' " 

[ 1] Baril C. Richaud C, Fourni E., Baranton G., Saint Girons I. J. Gen. Microbiol. 1 38:47-53(1 992),[2] Karsten W.E., 
Viola R.E. Biochim. Biophys. Acta 1121:234-238(1992). 

[1482] N-acotyl-gamma-gtutamyl-phosphato reductase activo sito N-acotyl-gamma-glutamyl-phosphato roductaso 
is (EC 1.2.1.38 ) (AGPR) [1 ,2] is the enzyme that catalyzes the third step in tho biosynthosis of arginlno Irom glutamato, 
the NADP-dependent reduction of N-acetyl-5-glutamyl phosphate into N-acetylglutamate 5-semialdehyde.ln bacteria 
it is a monof unctional protein of 35 to 38 Kd (gene argC) while in fungi it is part of a Afunctional mitochondrial enzyme 
(gene ARG5.6, argil orarg-6) which contains a N-terminal acetylglutamate kinase (EC 2.7.2.8 ) domain and a C-terminal 
AGPR domain. In the Escherichia coli enzyme, a cysteine has been shown to be implicated in the catalytic activity, the 
20 region around this residue is well conserved and can be used as a signature pattern. 

Consensus pattern: [LIVM]-[GSA]-x-P-G-C-[FY]-[AVP]-T-[GA]-x(3)-[GTAC]-[LIVM]-x-P [C is the active site residue] 
[ 1] Ludovice M., Martin J.R, Carrachas P., Liras P. J. Bacterid. 174:4606-461 3(1 992).| 2] Gessert S.F., Kim J.H., 
Nargang F.E., Weiss R.L J. Biol. Chem. 269:8189-8203(1994). 
[1483] 609. Sialyltransf erase family, 
25 Number of members: 18 

[1484] 610. Spoil rRNA Methylase family 

This family of proteins probably use S-AdoMet. Number of members: 58 

[1] SpoU protein of Escherichia coli belongs to a new family of putative rRNA methylases. Koonin EV, Rudd KE; Nucleic 
Acids Res 1993;21:5519-5519. [2] The spoil gene of escherichia coli , the fourth gene of the spoT operon, is essential 
30 for tRNA (Gm18) 2 ' methyltransferase activity. Persson BC, Jager G, Gustafsson C; Nucleic Acids Res 1997;25: 
4093-4097. 

[1485] 611. Stathmin family signatures 

Stathmin [1] (from the Greek 'stathmos'which means relay), is an ubiquitous intracellular protein, present in a variety 
of phosphorylated forms and which serves as a relay for diverse second messenger pathways. Its expression and 

35 phosphorylation are regulated throughout development and in response to extracellular signals regulating cell prolif- 
eration, differentiation and function. Stathmin is a highly conserved protein of 149 amino acid residues. Structurally, it 
consists of an N-terminal domain o1 about 45 residues followed by a 78 residue alpha-helical domain consisting of a 
heptad repeat coiled coil structure and a C-terminal domain of 25 residues. Protein SCG10 is a neuron-specific, mem- 
brane-associated protein that accumulates in the growth cones of developing neurons. It is highly similar in its sequence 

40 to stathmin, but differs in that it contains an additional N-terminal hydrophobic segment of 32 residues which is probably 
responsible for its interaction with membranes. Xenopus protein XB3 is also evolutionary related to slalhmin and also 
contains an additional N-terminal hydrophobic domain [2]. A conserved decapeptide which ends with the first three 
residues of the coiled coil domain and a second pattern that corresponds to part of the central region of the coilod coil 
have been selected as signatures for proteins of the stathmin family. 

45 Consensus pattern: P-[KRQ]-[KR](2HDE]-x-S-L-[EG]-E- 
Consensus pattern: A-E-K-R-E-H-E-[KR]-E- 

[1] Sobel A. Trends Biochem. Sci. 16:301 -305(1991 ).[ 2] Maucuer A., Moreau J. ( Mechali M., Sobel A. J. Biol. Chem. 
268:16420-16429(1993). 

[1486] 612. SUA5/yciO/yrdC family signature. The following uncharacterized proteins have boon shown [1] to share 
50 regions of similarities: - Yeast protein SUA5. - Escherichia coli hypothetical protein yciO and HI11 98, the corresponding 
Haemophilus influenzae protein. - Escherichia coli hypothetical protein yrdC and HI0656, the corresponding Haemo- 
philus influenzae protein. - Bacillus subtilis hypothetical protein ywlC. - Mycobacterium leprae hypothetical protein in 
rfe-hemK intergenic region. - Methanococcus jannaschii hypothetical protein MJ0062.These are proteins of from 20 
to 46 Kd which contain a number of conserved regions in their N-terminal section. They can be picked up in the database 
55 by the following pattern. 

[1487] Consensus pattern: |LIVMTA](3)-[LIVMFYC]-[PG]-T-[DE]-|STA]-x-[FYHQA]-[LIVM]-[GS]- 1 
[1488] [ 1] Bairoch A M Rudd K.E., Robison K. Unpublished observations (1995). 
[1489] 613. Sucrose synthase 
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Sucrose synthases catalyse the synthesis of sucrose from UDP-glucose and fructose. This family includes the bulk of 
the sucrose synthase protein. However the carboxyl terminal region of the sucrose synthases belongs to the glycosyl 
transferase family Glvcos trans! .1 . 
|1490] G14. SullotrMnolomoo proteins 
5 Number of members: 59 

[1491] 615. Synaptophysin / synaptoporin signature 

Synaptophysin and synaptoporin [1] are structurally related proteins, found in the membrane of synaptic vesicles, which 
may function as ionic or solute channels. These two glycoproteins seem to span the membrane four times. Both their 
N- and C-termini sequences seem to be cytoplasmically located. As a signature pattern for this family of proteins, a 

to | ,i t) hiy conooivod rofjion located in tho beginning ol tho lirst intmvosicular loop just after the first transmembrane domain 
has been selected. This region contains a cysteine residue that may be involved In a disulfide bond. 
Consensus pattern: L-S-V-[DE]-C-x-N-K-T [C may be involved in a disulfide bond [ 1) Knaus R, Marqueze-Pouey B., 
Scherer H., Betz H. Neuron 5:453-462(1990). 
[1492] 616. Syndecans signature 

is Syndecans [1 ,2] (from the greek syndein; to bind together) are a family of transmembrane heparan sulfate proteogly- 
cfino which nro implicatod in tho binding of oxlracollular matrix components and growth factors. Syndecans bind a 
variety of molecules via their heparan sulfate chains and can act as receptors or as co-receptors. Structurally, these 
proteins consist of tour separate domains: a) A signal sequence; b) An extracellular domain (ectodomain) of variable 
length and whose sequence is not evolutionary conserved in the various forms of syndecans. The ectodomain contains 

20 the sites of attachment of the heparan sulfate glycosaminoglycan side chains; c) A transmembrane region; d) A highly 
consorvod cytoplasmic domain ol about 30 to 35 residues which could interact with cytoskeletal proteins. The proteins 
known \a bnlong to lliln Inmlly mo: - Synclncfin 1. - Synrlocnn 2 or libroglycnn. - Syndecan 3 or nouroglycan or N- 
syndecan. - Syndecan 4 or amphiglycan or ryudocan. - Drosophila syndecan. - Caenorhabditis elegans probable syn- 
decan (F57C7.3).The signature pattern that has been developed for syndecans starts with the last residue of the 

25 transmembrane region and includes the first 10 residues of the cytoplasmic domain. This region, which contains four 
basic residues, could act as a stop transfer site. 
Consensus pattern: [FY]-R-[IM]-[KR]-K(2)-D-E-G-S-Y 

| 1] Bomliold M., Kokonyosi R., Kato M., Hinkos M.T M Spring J., Gallo R.L, Lose E.J. Annu. Rev. Cell Biol. 8:365-393 
(1992).[ 2] David G. FASEB J. 7:1023-1030(1993). 

30 [1 493] 61 7. Syntaxin / epimorphin family signature 

The following proteins have been shown to be evolutionary related [1 ,2,3]: - Epimorphin (or syntaxin 2), a mammalian 
mesenchymal protein which plays an essential role in epithelial morphogenesis. - Syntaxin 1 A (also known as antigen 
HPC-1) and syntaxin 1B which are synaptic proteins which may be involved in docking of synaptic vesicles at presy- 
naptic active zones. - Syntaxin 3. - Syntaxin 4, which is potentially involved in docking of synaptic vesicles at presynaptic 

35 active zones. - Syntaxin 5, which mediates endoplasmic reticulum to golgi transport. - Syntaxin 6, which is involved in 
intracellular vesicle trafficking. - Syntaxin 7. - Yeast PEP12 (or VPS6) which is required for the transport of proteases 
to the vacuole. - Yeast SED5 which is required for the fusion of transport vesicles with the Golgi complex. - Yeast SSOl 
and SS02 which are required for vesicle fusion with the plasma membrane. - Yeast VAM3, which is required for vacuolar 
assembly. - Arabidopsis thaliana protein KNOLLE which may be involved in cytokinesis. - Caenorhabditis elegans 

40 hypothetical proteins F35C8.4, F48F7.2, F55A11.2 and T0lB11.3.The above proteins share the following character- 
istics: a size ranging from30 Kd to 40 Kd; a C-terminal extremity which is highly hydrophobic and isprobably involved 
in anchoring the protein to the membrane; a central, well conserved region, which seems to be in a coiled-coil confor- 
mation The pattern specific for this family is based on the most conserved region of the coiled coil domain. 
Consensus pattern: [RQ]-x(3)-[LIVMA].x(2)MLIVM]-[ESHJ-x(2)-[LIVMT]-x-[DEVM]-[LIVM]-x(2)-[LIVM)-[FS]-x(2)- 

45 [LIVM]-x(3)-[LIVT)-x(2)-Q- [GADEQ]-x(2)-[LlVM]-[DNQT]-x-[LIVMF]-[DESV]-x(2)-[LIVM] 

[ 1] Bennett M.K., Garcia-Arraras J.E., Elferink L.A., Peterson K., Fleming A.M., Hazuka CD., Scheller R.H. Cell 74: 
863-873(1993).! 2] Spring J., Kato M. t Bernfield M. Trends Biochem. Sci. 18:124-125(1993).[ 3] Pelham H.R.B. Cell 
73:425-426(1993). 
[1494] 618. Sm protein 

50 The U1 , U2, U4/U6. and U5 small nuclear ribonucleoprotein particles (snRNPs) involved in pre-mRNA splicing contain 
seven Sm proteins (B/B\ D1, D2, D3, E, F and G) in common, which assemble around the Sm site present in tour of 
the major spliceosomal small nuclear RNAs. These proteins contain a common sequence motif in two segments, Sml 
and Sm2, separated by a short variable linker. 

[1495] .[1] Hermann H, Fabrizio P, Raker VA, Foulaki K, Hornig H, Brahms H, Luhrmann R EMBO J 1995;14: 
as 2076-2088. [2] Kambach C, Walke S, Young R, Avis JM, de la Fortelle E, Raker VA, Luhrmann R, Li J, Nagai K; Cell 
1999;96:375-387. 
[1496] 619. Skp1 family 

[1 497] [1 ] Stebbins CE, Kaelin WG Jr, Pavletich NP; Science 1 999;284:455-461 . 
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[1498] 620. Protein secY signatures 

Theeubacterial secY protein (1] plays an important rote in protein export. It interacts with the signal sequences of 
secretory proteins as well as with two other components of the protein translocation system: secA and secE. SecY is 
an integral plasma membrane protein of 419 to 492 amino acid residues that apparently contains ten transmombrano 

5 segments. Such a structure probablyconfers to secY a 'translocator* function, providing a channel for periplasmic and 
outer-membrane precursor proteins. Homologs of secY are found in archaebacteria [2]. SecY is also encoded in' the - 
chloroplast genome of some algae [3] where it could be involved in a prokaryotic-like protein export system across the 
two membranes of the chloroplast endoplasmic reticulum (CER) which 16 present in chromophyte andcryptophyte algae. 
Two signature patterns have been developed for secY proteins. The first corresponds to the second transmembrane 

io region, which is the most conserved section of these proteins. The socond spans the C-torminal part of the lourth 
transmembrane region, a short intracellular loop, and the N-terminat part of the fifth transmembrane region. 
Consensus pattern: [GSTHLIVMF](2)-x-[LIVM]-G-[LlVM]-x-P-[LI^^ 
(2) 

Consensus pattern: [LIVMFYW](2)-x-[DE]-x-[UVMF]-|STN]-x(2)-G-[LIVMF]-[GST]-[NST]-G-x-|GST]-[LIVMF](3) 
is [ i]ltoK. Mol. Microbiol. 6:2423-2428(1 992).[ 2] Auer J., SpickerG., BoeckA. Biochimie 73:683-688(1 991 ).[ 3] Douglas 
S.E. FEBS Lett. 298:93-96(1992). 

[1 499] 621 . (Seed protein) Small hydrophilic plant seed proteins signature. The following small hydrophilic plant seed 
proteins are structurally related: - Arabidopsis thaliana proteins GEA1 and GEA6. - Cotton late embryogenesis abun- 
dant (LEA) protein D-19. - Carrot EMB-1 protein. - Barley LEA proteins B19.1A, B19.1B, B19.3and B19.4. - Maize late 

20 embryogenesis abundant protein Emb564. - Radish late seed maturation protein p8B6.-Rice embryonic abundant 
protein Emp1. - Sunflower 10 Kd late embryogenesis abundant protein (DS10). • Wheat Em protoins. Thoso protoins 
contains from 83 to 153 amino acid residues and may play a role[1,2] in equipping the seed for survival, maintaining 
a minimal level of hydration in the dry organism and preventing the denaturation of cytoplasmic components. They 
may also play a role during imbibition by controlling water uptake. As a signature pattern, the best conserved region 

2S in the sequence of these proteins has been developed, it is a glycine-rich nonapeptide located in the N-terminal section. - 
[1 500] Consensus pattern: G-[EQ]-T-V-V-P-G-G-T- 

[1501] [ 1) Dure L III, Crouch M., Harada J., Ho T.-H. D., Mundy J., Quatrano R., Thomas T, Sung Z.R. Plant Mol. 
Biol. 12:475-486(1 989). [ 2] Gaubier P., Raynal M., Hull G., Huestis G.M., Grellet F., Arenas C, Pages M., Delseny M. 
Mol. Gen. Genet. 238:409-418(1993). 

30 [1502] 622. Serine carboxypeptidases, active sites 

All known carboxypeptidases are either metallo carboxypeptidases or serinecarboxypeptidases. The catalytic activity 
of the serine carboxypeptidases, like that of the trypsin family sorino protoasos, is provided by a chargo relay systom 
involving an aspartic acid residue hydrogen-bonded to a histidine, which is itself hydrogen-bonded to a serine [1]. 
Proteins known to be serine carboxypeptidases are: - Barley and wheat serine carboxypeptidases I, II, and III [2]. - 

35 Yeast carboxypeptidase Y (YSCY) (gene PRC1), a vacuolar protease involved in degrading small peptides. - Yeast 
KEX1 protease, involved in killer toxin and alpha-factor precursor processing. - Fission yeast sxa2, a probable carbox- 
ypeptidase involved in degrading or processing mating phoromones [3]. - Penicillium janthinellum carboxypeptidase 
S1 [4]. - Aspergullus nigor carboxypoptidaso popF, - Asporgullus salol carboxypoptldfiuo cpdS. - Vorlobmlo piotoctlvo 
protein / cathepsin A [5], a lysosomal protein which is not only a carboxypeptidase but also essential for the activity of 

40 both beta-galactosidase and neuraminidase. - Mosquito vitellogenic carboxypeptidase (VCP) [6]. - Naegleria fowleri 
virulence-related protein Nf314 [7]. - Yeast hypothetical protein YBR139w. - Caenorhabditis elegans hypothetical pro- 
teins C08H9.1 , F13D12.6, F32A5.3, F41C3.5 and K10B2.2.This family also includes: - Sorghum (s)-hydroxymandelo- 
nitril lyase (hydroxynitrile lyase) (HNL) [8], an enzyme involved in plant cyanogenesis. The sequences surrounding 
the active site serine and histidine residues are highly conserved In all thoso sorino carboxypoplidaaos. 

45 Consensus pattern: [LIVM]-x-[GTAJ-E-S-Y-[AG]-[GS] [S is the active site residue] 

Consensus pattern: [LIVF]-x(2)-[LIVSTA]-x-[IVPST]-x-[GSDNQL]-[SAGV]-[SG]-H-x-[IVAQ]-P-x(3)-[PSA] [H is the ac- 
tive site residue] 

[1] Liao D.I., Remington S.J. J. Biol. Chem. 265:6528-6531 (1990). [ 2] Sorensen S.B., Svendsen I., Breddam K. 
Carlsberg Res. Commun. 54:1 93-202(1 989).[ 3] Imai Y„ Yamamoto M. Mol Cell. Biol. 12:1827-1834(1992). [ 4] Sv- 

50 endsen I., Hofmann T, Endrizzl J., Remington J., Broddam K. FEBS Lett. 333:39-43(1 993).| 5] Galjart N.J., Morroau 
H., Willemsen R., Gillemans N., Bonten E.J., d'Azzo A. J. Biol. Chem. 266:14754-14762(1 991 ).[ 6] Cho W.L, Deitsch 
K.W., Raikhel A.S. Proc. Natl. Acad. Sci. U.S.A. 88:10821 -10824(1 991 ).[ 7] Hu W.N., Kopachik W., Band R.N. Infect. 
Immun. 60:2418-2424(1 992). [ 8] Wajant K, Mundry K.W., Pfitzenmaier K. Plant Mol. Biol. 26:735-746(1 994).[ 9] Rawl- 
ings N.D., Barrett A.J. Meth. Enzymol. 244:19-61(1994).[E1] 

55 [1503] 623. Serpins signature. Serpins (SERine Proteinase INhibitors) [1,2,3,4] are a group of structurally relat d 
prot ins. They are high molecular weight (400 to 500 amino aclds),extracollular, irreversible serine protoase inhibitors 
with a well defined structural-functional characteristic: a reactive region that acts as a "bait' for an appropriate serine 
protease. This region is found in the C-terminal part of these proteins. Proteins which are known to belong to the serpln 
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family are listed below (references are only provided for recently determined sequences): - Alpha-1 protease inhibitor 
(alpha-1 -antitrypsin, contrapsin). - Alpha-1 -antichymotrypsin, - Antithrombin IN. - Alpha-2-antiplasmin. - Heparin co- 
factor II. h Complement CI inhibitor. - Plasminogen activator inhibitors" 1 (PAI-1-) and 2 (PAI-2). - Glia derived riexin 
(GDN) (Protoaso noxin I). - Protein C inhibitor. - Rat hopatocytos SPI-1 , SPI-2 and SPI-3 inhibitors. - Human squamous 

5 coll carcinoma antigon (SCCA) which may act in the modulation of the host immune response against tumor cells. - A 
lepidopteran protease inhibitor. - Leukocyte elastase inhibitor which, in contrast to other serpins, is an intracellular 
protein. - Neuroserpin [5], a neuronal inhibitor of plasminogen activators and plasmin. - Cowpox virus crmA |6], an 
inhibitor of the thiol protease interleukin-1B converting enzyme (ICE). CrmA is the only serpin known to inhibit a non- 
serine proteinase. - Some orthopoxviruses probable protease inhibitors, which may be involved in the regulation of the 

w blood ciotting cascade and/or of the complement cascade in the mammalian host. On the basis ol strong sequence 
similarities, a number of proteins with no known inhibitory activity are said to belong to this family; - Birds ovalbumin 
and the related genes X and Y proteins. - Angiotensinogen; the precursor of the angiotensin active peptide. - Barley 
protein Z; the major endosperm albumin. ■ Corticosteroid binding globulin (CBG). - Thyroxine-binding globulin (TBG). 
- Sheep uterine milk protein (UTMP) and pig uteroferrin-associated protein (UFAP). - Hsp47, an endoplasmic reticulum 

15 heat-shock protein that binds strongly to collagen and could act as a chaperone in the collagen biosynthetic pathway 
[7], - Mficpin. which scorns to function as a tumor suprossor (5J. • Pigment opithelium-derived factor precursor (PEDF), 
a protein with a strong noutrophic activity [8]. - Ep45, an estrogen-regulated protein from Xehopus (9]. A signature 
pattern has been developed for this family of proteins, centered on a well conserved Pro-Phe sequence which is found 
ten to fifteen residues on the C-terminal side of the reactive bond 

20 [1504] Consensus pattern: [UVMFY]-x-[LIVMFYAC]-[DNQ]-[RKHQS]-[PST]-F-[LIVMFY]-[LIVMFYC]-x-[LIVMFAH]- 
[ 1] Carrell R , Travis J. Trends Biochem. Sci. 1 0:20-24(1 985).[ 2]Carrell R.. Pemberton P. A., Boswell D.R. Cold Spring 
Harbor Symp. Quant. Biol. 52:527-535(1 987).[ 3] Huber R., Carroll R.W. Biochemistry 28:8951 -8966(1 989). [ 4] 
Remold-O'Donneel E. FEBS Lett. 315: 105-108(1993).[ 5] Osterwalder T., Contartese J., Stoeckli E.T., Kuhn T.B., Son- 
deregger P. EMBO J. 15:2944-2953(1 996). [ 6] Komiyama T., Ray C.A., Pickup D.J., Howard A.D., Thomberry N.A., 

25 Peterson E.P., Salvesen G. J. Biol. Chem. 269: 19331-1 9337(1 994).[ 7] Clarke E., Sandwal B.D. Biochim. Biophys. 
Acta 11 29:246-248(1 992). [ 8] Zou Z., Anisowicz A., Neveu M., Rafidi K., Sheng S., Sager R., Hendrix M.J., Seftor E., 
Thor A. Science 263:526-529(1 994).[ 9] Steele F.R., Chader G.J.. Johnson L.V., Tombran-Tink J. Proc. Natl. Acad. 
Sci. U.S.A. 90:1526-1530(1993). [10] Hollnnd L.J., Suksnng C, Wall A.A., Roborts L.R., Moser D.R., Bhattacharya A. 
J. Biol. Chem. 267:7053-7059(1992). 

30 [1505] 624. Sigma-54 interaction domain signatures and profile 

Some bacterial regulatory proteins activate the expression of genes from promoters recognised by core RNA polymer- 
ase associated with the alternative sigma-54 factor. These have a conserved domain of about 230 residues involved 
in the ATP-dependent [1 ,2] interaction with sigma-54. This domain has been found in the proteins listed below: - acoR 
from Alcaligenes eulrophus, an activator of the acetoin catabolism operon acoXABC.-algB Irom Pseudomonas aeru- 

35 ginosa, an actuator of alginate biosynthetic gene algD. - dctD from Rhizobium an activator of dct A, the C4-dicarboxylate 
transport protein. - dhaR from Citrobacter freundii, a regulator of the dha operon for glycerol utilization. - fhIA from 
Escherichia coli, an activator of the formate dehydrogenase H and hydrogenase III structural genes. - fibD from Cau- 
lobacter crescentus, an activator of flagellar genes. - hoxAlrom Alcaligenes eutrophus, an activator of the hydrogenase 
operon. - hrpS from Pseudomonas syringae, an activator of hprD as well as other hrp loci involved in plant pathogenicity. 

40 - hupR1 from Rhodobacter capsulatus, an activator of the [NiFe] hydrogenase genes hupSL - hydG from Escherichia 
coli and Salmonella typhimurium, an activator of the hydrogenase activity. - levR from Bacillus subtilis, which regulates 
the expression of the levanase operon (levDEFG and sacC). - nil A (as well as anfA and vnf A) from various bacteria, 
an activator of the nif nitrogen-fixing operon. - ntrC, from various bacteria, an activator of nitrogen assimilatory genes 
such as that for glutamine synthetase (glnA) or of the nif operon. - pgtA from Salmonella typhimurium, the activator of 

45 the inducible phospho- glycerate transport system. - pilR from Pseudomonas aeruginosa, an activator of pilin gene 
transcription. - rocR from Bacillus subtilis, an activator of genes for arginine utilization - tyrR from Escherichia coli, 
involved in the transcriptional regulation of aromatic amino-acid biosynthesis and transport. - wtsA, from Erwinia stew- 
ard, an activator of plant pathogenicity gene wtsB. - xylR from Pseudomonas putida, the activator of the tol plasmid 
xylene catabolism operon xylCAB and of xylS. - Escherichia coli hypothetical protein yfhA. - Escherichia coli hypothet- 

50 ical protein yhgB. About half of these proteins (algB, dcdT, flbD. hoxA, hupR1, hydG, ntrC, pgtA and pilR) belong to 
signat transduction two-component systems [3] and possess a domain that can be phosphorylated by a sensor-kinase 
protein in their N- terminal section. Almost all of these proteins possess a helix-turn-helix DNA-binding domain in their 
C-lerminal section. The domain which interacts with the sigma-54 factor has an ATPase activity. This may be required 
to promote a conformational change necessary for theinteraction [4]. The domain contains an atypical ATP-binding 

55 motif A (P-loop) as well as a form of motif B. The two ATP-binding motifs are located in the N-terminal section of the 
domain; signature patterns have been developed lor both motifs. Other regions of the domain are also conserved. On 
of them, located in the C-terminal section, has been selected as a third signature pattern. 
Consensus pattern: [LIVMFY](3)-x-G-[DEQ]-[STE]-G-[STAV]-G-K-x(2)-[LIVMFY] 
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Cons nsus pattern: [GS]-x-ILIVMF]-x(2)-A-[DNEQASHHGNEK]-G-[STIMHLIVMFY](3)-[DE]-[EK]-[LIVM] 
Consensus pattern: [FYW]-P-[GS]-N-[LIVM]-R-[EQ]-L-x-[NHAT] 

[ 1] Morrett E., Segovia L J. Bacteriol. 1 75:6067-6074(1 993).[ 2] Austin S., Kundrol C, Dixon R. Nucleic Acids Res. 
19:2281-2287(1991). [3] Albright L.M., Huala E., Ausubel F.M. Annu. Rev. Genet. 23:311-336(1 989).[ 4] Austin S., 
5 Dixon R. EMBO J. 11:2219-2228(1992). 

[1506] 625. Sigma-70 factors family signatures 

Sigma factors [1] are bacterial transcription initiation factors that promote the attachment of the core RNA polymerase 
to specific initiation sites and arethen released. They alter the specificity of promoter recognition. Most bacteria express 
a multiplicity of sigma factors. Two of these factors, sigma-70 (gene rpoD) ; , generally known as the major or primary 

io sigma factor, and sigma-54 (gene rpoN or ntrA) direct the transcription of a widevariety of genes. The other sigma 
factors, known as alternative sigma factors, are required for the transcription of specific subsets of genes. With regard 
to sequence similarity, sigma factors can be grouped into two classes: the sigma-54 and sigma-70 families. The sigma- 
70 family includes, in addition to the primary sigma factor, a wide variety of sigma factors, some of which are listed 
below: - Bacillus sigma factors involved in the control of sporuiation-specific genes: sigma-E (sigE or spoilGB), sigma- 

is F (sigF or spollAC), sigma-G (sigG or spolllG), sigma-H (sigH or spoOC) and sigma-K (sigK or spolVCB/spolllC). - 
Escherichia co!i and related bacteria sigma-32 (gene rpoH or htpR) involved in the expression of heat shock genes. - 
Escherichia coli and related bacteria sigma-27 (gene fliA) involved in the expression of the flagellin gene. - Escherichia 
coli sigma-S (gene rpoS or katF) which seems to be involved in the expression of genes required for protection against 
external stresses. - Myxococcus xanthus sigma-B (sigB) which is essential for the late-stago differentiation of that 

20 bacteria. Alignments of the sigma-70 family permit the identification of four regions of high conservation [2,3]. Each of 
these four regions can in turn bo subdivided into a numbor of sub-rogions. Signature patlorns basod on tho two bost- 
conserved sub-regions have been developed. The first pattern corresponds to sub-region 2.2;the exact function of this 
sub-region is not known although it could be involved in the binding of the sigma factor to the core RNA polymerase. 
The second pattern corresponds to sub-region 4. 2 which seems to harbor a DNA-binding 'helix -turn-helix , motif involved 

25 jn binding the conserved -35region of promoters recognized by the major sigma factors. The second pattern starts one 
residue before the N-terminal extremity of the HTH region and ends six residues after its C-terminal extremity. 
Consensus pattern: [DE]-[LIVMF](2)-[HEQS]-x-G-x-[LIVMFA]-G-L-[LIVMFYE]-x-[GSAM]-ILIVMAP] 
Consensus pattern: [STN]-x(2)-[DEQ]-[LIVM]-[GAS]-x(4)-[LIVMF]-[PSTG]-x(3)-[LTVMA]-x-[NQR]-[LIVMA]-[EQH]-x 
(3)-[LIVMFW]-x(2)-[LIVM] 

30 [ 1] Helmann J.D., Chamberlin M.J. Annu. Rev. Biochem. 57:839-872(1 988). [ 2] Gribskov M., Burgess R.R. Nucleic 
Acids Res. 14:6745-6763(1 986). [ 3] Lonetto M.A., Gribskov M., Gross C.A. J. Bacteriol. 1 74:3843-3849(1 992).[4] 
Lonetto M.A., Brown K.L., Rudd K.E., Buttner M.J. Proc. Natl. Acad. Sci. U.S.A. 91:7573-7577(1994). 
[1507] 626. Signal carboxyl-terminal domain. 430 members. 
[1508] 627. Signaj peptidases I signatures 

35 Signal peptidases (SPases) [1] (also known as leader peptidases) remove the signal peptides from secretory proteins. 
In prokaryotes three types of Spases are known: type I (gene lepB) which is responsible for the processing of the 
majority of exported pre-proteins; type II (gene Isp) which only process lipoproteins, and a third type involved in the 
processing of pili subunits. SPase I is an integral membrane protein that is anchored in the cytoplasmic membrane by 
one (in B. subtilis) or two (in E. coli) N-terminal transmembrane domains with the main part of the protein protuding in 
the periplasmic space. Two residues have been shown [2,3] to be essential for the catalytic activity of SPase I: a serine 
and an lysine. SPase I is evolutionary related to the yeast mitochondrial inner membrane protease subunit 1 and 2 
(genes IMP1 and IMP2) which catalyze the removal of signal peptides required for the targeting of proteins from the 
mitochondrial matrix, across the inner membrane, into the inter-membrane space [4]. In eukaryotes the removal of 
signal peptides is effected by an oligomeric enzymatic complex composed of at least five subunits: tho signal peptidase 

45 complex (SPC). The SPC is located in the endoplasmic reticulum membrane. Two components of mammalian SPC, 
the 18 Kd (SPC18) and the 21 Kd (SPC21) subunits as well as the yeast SEC11 subunit have been shown [5] to share 
regions of sequence similarity with prokaryotic SPases I and yeast IMP1/IMP2. Throo signature pattorns for thoso 
proteins have been developed. The first signature contains the putative active site serine, the second signature contains 
the putative active site lysine which is not conserved in the SPC subunits, and the third signature corresponds to a 

so conserved region of unknown iological significance which is located in the C-torminal section of all thoso protoins. 
Consensus pattern: [GS]-x-S-M-x-[PS]-[AT]-[LF] [S is an active site residue] 

Consensus pattern: K-R-[LIVMSTA](2)-G-x-[PG]-G-[DE]-x-[LIVM]-x-[LIVMFY] [K is an active site residue] 
Consensus pattern: [LIVMFYW](2)-x(2)-G-D-[NH]-x(3)-[SND]-x(2)-[SG] 

[ 1] Dalbey R.E., von Heijne G. Trends Biochem. Sci. 17:474-478(1 992).[ 2] Sung M., Dalbey R.E. J. Biol. Chem. 267: 
55 1 31 54-1 31 59(1 992).[ 3] Black M.T J. Bacteriol. 1 75:4957-4961 (1 993). [ 4] Nunnari J, , Fox TD. , Waller P. Science 262: 
1 997-2004(1 993).[ 5] van DIJI J.M., do Jong A,, Vohmaanpura J,, Vonuma Q„ Dion 3, EMI30 J. 11:2B19.2U2£J(1 002). 
[ 6] Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:1 9-61(1 994).[E1] 
[1509] 628. (sodcu) Copper/Zinc superoxide dismutase signatures 
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Copper/Zinc superoxide dismutase (SODC) |1] is one of the three forms of an enzyme that catalyzes the dismutation 
of superoxide radicals. SODC binds one atom each of zinc and copper. Various forms of SODC are known: acytoplasmic 
form in eiikaryotes, an additional chloroplast form in plants, an extracellular form in some eukaryotes, and a periplasmic 
lofm in prokuryoloe. Tho motal binding sitos aro consorvod in all the known SODC sequences [2]. Two signature 
5 patterns have been derived for this family of enzymes: the first one contains two histidine residues that bind the copper 
atom: the second one islocated in the C-terminal section of SODC and contains a cysteine which is involved in a 
disulfide bond. Consensus pattern: [GAMIMFAT)-H-[LIVF]-H-x(2)-[GP]-[SDG]-x-[STAGDE] [The two H's are copper 
ligands] 

Consensus pattern: G-[GN]-[SGA]-G-x-R-x-[SGA]-C-x(2)-[IV] [C is involved in a disulfide bond] 
tu [ 1] Banniolor J.V., Bunnlslor W.H.. Rotilio G. CRC Crit. Rov. Biochom. 22:111-154(1987).[2] Smith M.W., Doolittle R. 
F. J. Mol. Evol. 34:175-184(1992). 

[1510] 629. (sodfe) Manganese and iron superoxide dismutases signature 

Manganese superoxide dismutase (SODM) [1] is one of the three forms of an enzyme that catalyzes the dismutation 
of superoxide radicals. The four ligands of the manganese atom are conserved in all the known SODM sequences. 
t& Thooo motal ligands aro also consorvod in the rolatod iron form ofsuporoxido dismutases [2,3]. A short conserved 
rogion which includes two of tho tour ligands: an aspartate and a histidine has been selected as a signature. 
Consensus pattern: D-x-W-E-H-[STA]-[FY](2) [D and H are manganese/iron tigands] 

( 1] Bannister J.V., Bannistor W.H., Rotilio G. CRC Crit. Rev. Biochem. 22:111 -154(1987).[ 2] Parker M.W., Blake C.C. 
F. FEBS Lett. 229:377-382(1 988). | 3] Smith M.W., Doolittle R.F. J. Mol. Evol. 34:175-184(1992). 
20 [1511] 630. Spectrin repeat 

[1512] Spectrin repeats are found in several proteins involved in cytoskeletal structure. These include spectrin, alpha- 
actinin and dystrophin.The sequence repeat used in this family is taken from the structural repeat in reference [2]. The 
spectrin repeat forms a three helix bundle. The second helix is interrupted by proline in some sequences. 
Number of members: 898 

25 [1] Aclin-binding proteins. 1 : Spectrin super family. Hart wig JH; Protein Profile 1995;2:732-732. [2] Crystal struc- 

ture of the repetitive segments of spectrin. Yan Y, Winograd E, Viel A, Cronin T, Harrison SC, Branton D; Science 1 993; 
262:2027-2030. 

[1513] 631. (sublilase) Streptomyces subtilisin-type inhibitors signature 

Bacteria of the Streptomyces family produce a family of proteinase inhibitors[1] characterized by their strong activity 
30 toward subtilisin. They arecollectively known as SSI's: Streptomyces Subtilisin Inhibitors. Some SSI'salso inhibit trypsin 
or chymotrypsin. in their mature secreted form, SSI's areproteins of about 110 residues with two conserved disulfide 
bonds. + + + + MM 

xxxxxxxxxxxxxxCxxxxxxxCxxxxxxxxxCx#xxxxxxxxxxxxCxxxxxx************'C': conserved cysteine involved in a di- 
sulfide bond/*/': active site residue.'*': position of the pattern. 
35 Consensus pattern: C-x-P-x(2,3)-G-x-H-P-x(4)-A-C-[ATD]-x-L [The two C's are involved in a disulfide bond] 
[ 1)Taguchi S. ( Kojima S., Terabe M., Miura K.-l., Momose H. Eur. J. Biochem. 220:911-918(1994). 
[1514] 632. Sugar transport proteins signatures 

In mammalian cells the uptake of glucose is mediated by a family of closely related transport proteins which are called 
tho glucose transporters [1,2,3]. At least seven of these transporters are currently known to exist (in Human they are 

40 encoded by the GLUT1 to GLUT7 genes).These integral membrane proteins are predicted to comprise twelve mem- 
brane spanning domains. The glucose transporters show sequence similarities [4,5] with a number of other sugar or 
metabolite transport proteins listed below (references are only provided for recently determined sequences). - Es- 
cherichia coli arabinose-proton symport (araE). - Escherichia coli galactose-proton symport (galP). - Escherichia coli 
and Klobsiolla pneumoniae citrate-proton symport (also known as citrate utilization determinant) (gene cit). - Es- 

45 chorichia coli alpha-ketoglutarato permease (gene kgtP). - Escherichia coli prolino/betaine transporter (gene proP) [6]. 
- Escherichia coli xylose-proton symport (xylE). - Zymomonas mobilis glucose facilitated diffusion protein (gene glf). - 
Yeast high and low affinity glucose transport proteins (genes SNF3, HXT1 to HXT14). - Yeast galactose transporter 
(gene GAL2). - Yeast maltose permeases (genes MAL3T and MAL6T). - Yeast myo-inositol transporters (genes ITR1 
and ITR2). - Yeast carboxylic acid transporter protein homolog JEN1. - Yeast inorganic phosphate transporter (gene 

£0 PH084). - Kluyvoromyces lactis lactose permease (gene LAC 12). - Neurospora crassa quinate transporter (gene Qa- 
y), and Emoricella nidulans quinate permease (gone qutD). - Chlorella hexose carrier (gene HUP1). - Arabidopsis 
thaliana glucose transporter (gene STP1). - Spinach sucrose transporter. - Leishmania donovani transporters D1 and 
D2. - Leishmania enriettii probable transport protein (LTP). - Yeast hypothetical proteins YBR241c, YCR98c and 
YFL040w. - Caenorhabditis elegans hypothetical protein ZK637.1. - Escherichia coli hypothetical proteins yabE, ydjE 

55 and yhjE. - Haemophilus influenzae hypothetical proteins HI0281 and HI0418. - Bacillus subtilis hypothetical proteins 
yxbC and yxdF. It has boon suggostod [4] that those transport protoins havo ovolvod from thoduplication of an ancestral 
protein with six transmembrane regions, this hypothesis is based on the conservation of two G-R-[KR] motifs. The first 
one is located between the second and third transmembrane domains and the second one between transmembran 
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domains 8 and 9. Two patterns have been developed to detect this family of proteins. The first pattern is based on the 
G-R-[KR] motif; but because this motif is too short to be specific to this family of proteins, a pattern from a larger region 
centered on the second copy of this motif was derived. The second pattern is based on a number of conserved residues 
which are located at the end of the fourth transmembrane segment and in the short loop region between the fourth 
5 and fifth segments. 

Consensus pattern: [LIVMSTAG]-[LIVMFSAG}-x(2)-[LIVMSA]-[DE]-x-[LIVMFYWA]-G- R-[RK]-x(4,6)-[GSTA] 
Consensus pattern: [LIVMFJ-x-G-[LIVMFA]-x(2)-G-x(8)-[LIFY]-x(2)-[EQ]-x(6)- [RK] 

| 1] Silverman M. Annu. Rev. Biochem. 60:757-794(1 991 ).[ 2] Gould .G.W., Bell G.I. Trends Biochem. Sci. 15:18-23 
(1990).[ 3] Baldwin S.A. Biochim. Biophys. Acta 1 154: 17-49(1 993). [ 4] Maiden M.C.J. , Davis E.O., Baldwin S.A., Moore 
io D.C.M., Henderson P.J.F. Nature 325:641-643(1987)4 5] Henderson P.J.R Curr. Opin. Struct. Biol. 1:590-601(1991). 
[6] Culham D.E., Lasby B., Marangoni AG., Milner J.L.. Steer B.A., van Nues R.W., Wood J.M. J. Mol. Biol. 229: 
268-276(1993). 

[1515] 633. Synaptobrevin signature 

Synaptobrevin [1] is an intrinsic membrane protein of small synaptic vesicles whose function is not yet known, but 
is which is highly conserved in mammals, electric ray (where its is known as VAMP-1), Drosophila and yeast [2]. In yeast 
there are two closely related forms of synaptobrovin (gonos SNC1 andSNC2) while In mHmmHlo thoro la hi lonst 4 
(genes SYB1 , SYB2, SYB3 and SYBL1 ^Structurally synaptobrevin consist of a N-terminal cytoplasmic domain of from 
90 to 110 residues, followed by a transmembrane region, and then by a short (from 2 to 22 residues) C-terminal intra- 
vesicular domain. As a signature pattern for synaptobrevin, a highly conserved stretch of residues located in the central 
20 part of the sequence was selected. 

Consensus pattern: N-[LIVM]-[DENS]-[KL]-V-x-[DEO]-R-x(2)-[KRHLIVM]-[STDE]- x-[LIVM]-x-fDE]-[KR]-[TA]-[DE] 
[ 1] Suedhof T.C., Baumert M. ( Perin M.S., Jahn R. Neuron 2: 1475-1 481 (1989).[ 2] Gerst J.E., Rodgers L, Riggs M., 
Wigler M. Proc. Natl. Acad. Sci. U.S.A. 89:4338-4342(1992). 

[1516] 634. TBC domain. Identification of a TBC domain in GYP6_YEAST and GYP7_YEAST, which are GTPase 
25 activator proteins of yeast Ypt6 and Ypt7, imply that these domains are GTPase activator proteins of Rab-like small 
GTPases. Number of members: 55 

[1] Medline: 96032578. Molecular cloning of a cDNA with a novel domain present in the tre-2 oncogene and the 
yeast cell cycle regulators BUB2 and cdc16. Richardson PM, Zon LI; Oncogene 1995;11:1139-1148. 
30 [2]Medline: 97398935. A shared domain between a spindle assembly checkpoint protein and Ypt/Rab-specific 

GTPase-activators. Neuwald AF; Trends Biochem Sci 1997;22:243-244. 

[1517] 635. Transcription factor TFIID repeat signature (TBP) . 

Transcription factor TFIID (or TATA-binding protein, TBP) [1 ,2] is a general factor that plays a major role in the activation 
35 of eukaryotic genes transcribed by RNA polymerase II. TFIID binds specifically to the TATA box promoter element 
which lies close to the position of transcription initiation. There is a remarkable degree of sequence conservation of a 
C-terminal domain of about 180 residues in TFIID from various eukaryotic sources. This region isnecessary and suf- 
ficient for TATA box binding. The most significant structural feature of this domain is the presence of two conserved 
repeats of a 77 amino-acid region. The intramolecular symmetry generates a saddle-shaped structure that sits astride 
40 the DNA [3]. Drosophila TRF (TBP-related factor) [4] is a sequence-specific transcription factor that also binds to the 
TATA box and is highly similar to TFIID. Archaebacteria also possess a TBP homolog [5]. A signature pattern that 
spans the last 50 residues of the repeated region has been derived.- 

Consensus pattern: Y-x-P-x(2)-[IF]-x(2)-[LIVM](2)-x-[KRH]-x(3)-P-[RKQ]-x(3)- L-[LIVM]-F-x-[STN]-G-[KR]-[LIVM]-x 
(3)-G-[TAGL]-[KR]-x(7)- [AGC]-x(7)-[LIVM [ 1] Hoffmann A., Sinn E., YamamotoT., Wang J., Roy A. ( Horikoshi M., 
45 Roeder R.G. Nature 346:387-390(1 990).[ 2] Gash A., Hoffmann A., Horikoshi M., Roodor R.G., Chua N.-H. Nature 
346:390-394(1 990). [ 3] Nikolov D.B., Hu S.-H., Lin J., Gasch A., Hoffmann A. ( Horikoshi M., Chua N.-H., Roeder R. 
G., Burley S.K. Nature 360:40-46(1 992).[ 4] Crowley T.E., Hoey T., Liu J.-K., Jan Y.N., Jan L.Y., Tjian R. Nature 361: 
557-561 (1993). [5] Marsh T.L., Reich C.I., Whitelock R.B., Olsen G.J. Proc. Natl. Acad. Sci. U.S.A. 91:4180-4184 
(1994). 

50 [1518] 636. Translationally controlled tumor protein signatures (TCTP) 

Mammalian translationally controlled tumor protein (TCTP) (or P23) is a protein which has been found to be preferen- 
tially synthesized in cells during the early growth phase of some types of tumor [1,2], but which is also expressed in 
normal cells. The physiological function of TCTP is still not known. It is a hydrophilic protein of 18 to 20 Kd. Close 
homologs have been found in plants [3], earthworm [4], Caenorhabditis elegans (F52H2.11), Hydra, budding yeast 

55 (YKL056c) [5] and fission yeast (SpAC1 F1 2.02c) Two of the best conserved regions have been selected as signature 
patterns for TCTP I 
Consensus pattern: [IFA]-[GA]-[GAS]-N-[PAK]-S-IGA]-E-[GDE]-[PAGE]-[DEQGA] 
Consensus pattern: [FLVH]-lFY]-[IVCT]-G-E-x-[MA)-x(2,5)-[DEN]-|GAST]-x-[LV]-[AV]*x(3)-[FYW] 


228 


EP 1 033 405 A2 


[ 1] Boehm H Beendorf R.. Gaestel M., Gross B., Nuernberg P., Kraft R.. Otto A., Bielka H. Biochem. Int. 19:277-286 
(1989) [ 2] Ma'kridos S., Chitpatima S.T., Bandyopadhyay R., Brawerman G. Nucleic Acids Res. 16:2350-2350(1988). 
3] Pay A 1 Heberle-Bors E., Hirt H. Plant Mol. Biol. 19:501 -503(1 992).[4] Stuerzenbaum S.R., Kille P.,- Morgan A.J. 
Biochim. Biophys. Acta 1398:294-304(1998).] 5] Rasmussen S.W. Yeast 10:S63-S68(1994). 

£ [1619] 637. TFIIS zinc ribbon domain signature 

Transcription tactor S-ll (TFIIS) [1] is a eukaryotic protein necessary tor efficient RNA polymerase II transcription elon- 
gation, pasl tomplalo-oncodod pauso sites. TFIIS shows DNA-binding activity only in the presence ol RNA polymerase 
II It is a protein ol about 300 amino acids whose sequence is highly conserved in mammals, Drosophila, yeast (where 
it was first known as PPR2. a transcriptional regulator, of URA4. and then as DST1, the DNA strand transfer protein 

w alpha 121) and in the archaobacteria Sulfolobus acidocaldarius [3].This family also includes the eukaryotic and arche- 

bacterial RNA polymerase subunits of the 1 5 Kd / M family (see <PDOC00790 >) as well as the following viral proteins: 

- Vaccinia virus RNA polymerase 30 Kd subunit (rpo30) [4]. - African swine fever virus prolein I243L [5].The best 

consorved region of all these proteins contains four cysteines that bind a zinc ion and fold in a conformation termed a 

'zinc ribbon' 16]. Besides these cysteines, there are a number of other conserved residues which can be used to help 

is define a specific pattern for this type of domain. 

Consensus pattern: C-x(2)-C-x(9)-|LIVMQSAR]-[QH]-[STQL]-|RA]-[SACR]-x-[DE]-[DET]-[PGSEA]-x(6)-C-x(2.5)-C-x 

(3MFW1 IThe four C's are zinc ligands] 

1 Hirashima S . Hirai H., Nakanishi Y, Natori S. J. Biol. Chem. 263:3858-3863(1 988).[ 263:3858-3863(1988).] 2] Ki- 
plinq D Koarsoy S.E. Nature 353:509-509(1 991 ).[ 3] Langer D„ Zillig W. Nucleic Acids Res. 21 :2251 -2251(1993).[ 4] 
so Ann B -Y. Gershon PD , Jones E.V.. Moss B. Mol. Cell. Biol. 10:5433-5441 (1990).| 5] Rodriguez J.M., Salas M.L, 
Vinuela E.' Virology 186:40-52(1992).( 6) Qian X., Jeon C, Yoon H., AgarwalK., Weiss M A. Nature 365:277-279(1 993). 
| tr.jiOl r.nn TolrnhydinlnlMin rlnhyrliocinnrmo/nyclohydrolFiBO signatures (THF DHG CYH) 

Enzymes that participate in Iho Iranslor ol one-carbon units are involved in various biosynthollc pathways. In many ol 
these processes the transfers of one-carbon units are mediated by the coenzyme tetrahydrololate (THF). Various 

25 reactions generate one-carbon derivatives of THF which can be interconverted between different oxidation states by 
formyltetrahydrofolatesynthetase(EC6J^),methylenetetrahydrololate dehydrogenase (EC 1^L5 or EC 1.5.1.15) 
and molhonyltotrahydrofolato cyclohydrolase (EC 3.5.4.9 VThe dehydrogenase and cyclohydrolase activities are ex- 
pressed by a variety ol multifunctional onzymos: - Eukaryotic C-1 -tetrahydrololate synthase (C1-THF synthase), which 
catalyzes all three reactions described above. Two forms of C1-THF synthases are known [1], one is located in the 

so mitochondrial matrix, while the second one is cytoplasmic. In both forms the dehydrogenase/cyclohydrolase domain 
is located in the N-terminal section of the 900 amino acids protein and consists of about 300 amino acid residues. The 
C1-THF synthases are NADP- dependent. - Eukaryotic mitochondrial Afunctional dehydrogenase/cyclohydrolase [2]. 
This is an homodimoric NAD-dopondont enzyme ol about 300 amino acid residues. - Bacterial tolD [3]. FolD » is > an 
homodimeric bilunctional NADP-dopondonl enzyme ol about 290 amino acid rosiduos. The sequence of the dehydro- 

35 genase/cyclohydrolase domain is highly conserved in all forms of the enzyme. Two conserved regions have been 
selected as signature patterns. The first one is located in the N-terminal part of these enzymes and contains three 
acidic residues. The second pattern is a highly conserved sequence of 9 amino acids which is located in the C-termmal 
SGCtion 

Consensus pattern: [EQ)-x-[EQK]-[LIVM](2)-x(2)-ILIVM]-x(2)-ILIVMY]-N-x-[DN]-x(5)-|LIVMF)(3)-Q-L-P-[LV] 

40 Consensus pattern: P-G-G-V-G-P-[MF]-T-[IV] 

[ 1] Shannon K.W., Rabinowitz J.C. J. Biol. Chem. 263:7717-7725(1988).! 2] Belanger C. Mackenzie R E. J. Biol. 
Chem. 264:4837-4843(1 989).[ 3] d'Ari L, Rabinowitz J.C. J. Biol. Chem. 266:23953-23958(1991). 
[1521] 639. Triosephosphate isomerase active site (TIM) 

Triosephosphate isomerase (EC 5.3.1.1 ) (TIM) [1 ] is the glycolytic enzyme that catalyzes the reversible interconvers.on 
<s of glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. TIM plays an important role in several metabolic 
pathways and is essential for efficient energy production. It is a dimer of identical subunits, each of which is made up 
of about 250 amino-acid residues. A glutamic acid residue is involved in the catalytic mechanism [2]. The sequence 
around the active site residue is perfectly conserved in all known TIM's and can be used as a signature pattern for this 
type of enzyme. 

so consensus pattern: [AV]-Y-E-P-[LIVM]-W-[SA]-I-G-T-[GK] [E is the active site residue] cc , QMOQm , 9 i 

[ 1] Lolis E„ Alber T.. Davenport R.C., Rose D., Harlman F.C.. Petsko G.A. Biochemistry 29:6609-6618(1990).] 2] 
Knowles J.R. Nature 350:121-124(1991). 
[1522] 640. Thymidine kinase cellular-type signature (TK) 

Thymidine kinase (TK) (EC 2.7.1.21) is an ubiquitous enzyme that catalyzes the ATP-dependent phosphorylation of 
55 thymidine A comparison of TK sequences has shown [1,2,3] that there are two different families of TK. One family 
groups together TK from herpes viruses as well as cellular thymidylate kinases, while the second family currently, 
consists of TK from the following sources: - Vertebrates. - Bacterial. - Bacteriophage T4. - Pox viruses. - African swine 
fever virus (ASF) - Fish lymphocystis disease virus (FLDV).A conserved region which is located in the C-termmal 


229 


NSOOCID: <EP 1033405A2_I_» 


EP 1 033 405 A2 


section of these enzymes has been selected as a signature pattern for this family of TKA. 

Consensus pattern: [GA]-x(1,2)-[DE]-x-Y-x-[STAP]-x-C-[NKR]-x-[CH]-[LIVMFYWH] [ 1] Boyle D.B., Coupar B.E.H., 
Gibbs A.J., Seigman L.J., Both G.W. Virology 1 56:355-365(1 987).[ 2] Blasco R, Lopez-Otin C, Munoz M., Bockamp 
E.-O., Simon-Mateo C, Vinuela E. Virology 178:301 -304(1 990). [ 3] Robertson G.R. ( Whalloy J.M. Nucloic Acids Rgs. 
16:11303-11317(1988). 

[1523] 641. Thymidine kinase from herpesvirus (TK herpes) 
[1] 

Medline: 96003730 

Crystal structures of the thymidine kinase from herpes simplex virus type-t in complex with deoxythymidine and gan- 
ciclovir. 

Brown DG, Visse R, Sahdhu G, Davies A, Rizkallah PJ, Mefitz 
C, Summers WC, Sanderson MR; 
Nat Struct Biol 1995;2:876-881. 
Number of members: 65 

[1524] 642. Nuclear transition protein 2 signatures (TP2) 

In mammals, the second stage of spermatogenesis is characterized by the conversion of nucleosomal chromatin to 
the compact, non-nucleosomal and transcriptionally inactive form found in the sperm nucleus. This condensation is 
associated with a double-protein transition. The first transition corresponds to the replacement of histones by several 
spermatid-specific proteins, also called transition proteins, which are themselves replaced by protamines during the 
second transition. Nuclear transition protein 2 (TP2) is one of those spermatid-specific proteins. TP2 is a basic, zinc- 
binding protein [1] of 116 to 137 amino-acid residues. Structurally, TP2 consists of three distinct parts: a conserved 
serine-rich N-terminal domain of about 25 residues, a variable central domain of 20 to 50 residues which contains 
cysteine residues, and a conserved C-terminal domain of about 70 residues rich in lysines and arginines. Two signature 
patterns for TP2 have been developed: one located in the N-terminal domain, the other in the C-terminal. 
Consensus pattern: H-x(3)-H-S-[NS]-S-x-P-Q-S 
Consensus pattern: K-x-R-K-x(2)-E-G-K-x(2)-K-[KR]-K 

[1] Baskaran R., RaoM.R.S. Biochom. Biophys. Ros. Commun. 179:1491-1499(1991). 
[1525] 643. Thiamine pyrophosphate enzymes signature (TTP enzymes) 

A number of enzymes require thiamine pyrophosphate (TPP) (vitamin B1) as a cof actor. It has been shown [1] that 
some of these enzymes are structurally related. These related TPP enzymes are: - Pyruvate oxidase (POX) (EC 1.2.3.3 ) 
Reaction catalyzed: pyruvate + orthophosphate + 0(2) + H(2)0 = acetyl phosphate + CO(2) + H(2)0(2). - Pyruvate 
decarboxylase (PDC) (EC 4.1.1.1 ) Reaction catalyzed: pyruvate = acotatdohyde + CO(2). - Indolopyruvato decarbox- 
ylase (EC 4.1.1.74 ) [2] Reaction catalyzed: indole-3-pyruvato = indolo*3-acotaldohydo t CO(2). - Acololactato synthaeo 
(ALS) (EC 4.1.3.18) Reaction catalyzed: 2 pyruvate = acetolactate + CO(2). - Benzoylformate decarboxylase (BFD) 
(EC 4.1.1.7 ) [3] Reaction catalyzed: benzoylformate = benzaldehyde + CO(2). A. conserved region which is located in 
their C-terminal section has been selected as a signature pattern for these enzymes. 
Consensus pattern: [LIVMF]-[GSA)-x(5)-P-x(4)-[LIVMFYW]-x-[LIVMF]-x-G-D-[GSA]-[GSAC] 

[ 1] Green J.B.A. FEBS Lett. 246;1-5(1989).[ 2] Koga J,, Adachl T, Hidaka H. Mol. Gon. Gonot. 226:10-18(1001 ).| 3] 
Tsou A.Y., Ransom S.C., Gerlt J.A., Buechter D.D., Babbitt PC, Kenyon G.L. Biochemistry 29:9856-9862(1990). 
[1526] 644. TPR Domain 

m 

Medline: 95397415 

Tetratrico peptide repeat interactions: to TPR or not to TPR? 
Lamb JR, Tugendreich S, Hieter P; 
Trends Biochem Sci 1995;20:257-259. 
[2]Medline: 98151343 

The structure of the tetratricopeptide repeats of protein phosphatase 5: implications for TPR-mediated protein-protein 
interactions. 

Das AK. Cohen PW ( Barford D; 

EMBO J 1998;17:1192-1199. 
Number of members: 621 

[1527] 645. Uroporphyrin-lll C-methyltransferase signatures (TP methylase) 

Uroporphyrin-lll C-methyltransferase (EC 2.1.1.107 ) (SUMT) [1 ,2] catalyzes the transfer of two methyl groups from S- 
adenosyl-L-methionine to the C-2 and C-7atoms of uroporphyrinogen III to yield precorrin-2 via the intermediate for- 
mation of precorrin-1 . 

SUMT is the first onzyme specific to tho cobalamln pathway and piocorrin-2 Is a common Inloimodlato In tho biodyn* 
thesis of corrinoids such as vitamin B12, siroheme and coenzyme F430,The sequences of SUMT from a variety of 
eubaderial and archaebacterial species are currently available. In species such as Bacillus megalerlum (gene cobA), 
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Pseudomonas denitrificans (cobA) or Methanobacterium ivanovii (gene corA) SUMT is a protein of about 25 to 30 Kd. 
In Escherichia coli and related bacteria, the cysG protein, which is involved in the biosynthesis of siroheme, is a mul- 
tifunctional protein composed of a N-terminal domain, probably involved in transforming precorrin-2 into siroheme, and 
ii cj louniuMl domain which line* tUJMT activity. Tho ticjt|iionca ol &UMT l» ml/Hod to llwit ol n number ol P. donitrilicrino 

5 and Salmonolla typhimurium onzymos involved in tho biosynthosis of cobalamin which also soom to bo SAM -dependent 
methyltransferases [3,4]. The similarity is especially strong with two of these enzymes: cobl/cbiL which encodes S- 
adenosyl-L-methionine-precorrin-2 methyltransferase and cobM/cbiF whose exact function is not known. Two signa- 
ture patterns have been developed for these enzymes. The first corresponds to a well conserved region in the N- 
Inrminnl nxtromily {cnllod rogion 1 in [1,3)) nnd the second to a less conserved region located in the central part of 

to thoso proteins (this pattern spans what aro callod rogions 2 and 3 in [1 ,3]). 

Consensus pattern: [LIVM]-[GS].[STAL]-G-P-G-x(3)-[LIVMFY]-[LIVM]-T-[LIVM]4KRHQG]-[AG] 

Consensus pattern: V-x(2)-[LI]-x(2)-G-D.x(3)-[FYW]-[GS]-x(8)-[LIVF)-x(5,6)-[LIVMFYWPAC]-x-[LIVMY]-x-P-G 

[ 1] Blanche F , Robin C, Coudor M., Fauchor D. t Cauchois L, Cameron B., Crouzet J. J. Bacterid. 173:4637-4645 

(1991) [ 2] Robin C , Blanche F. f Cauchois L. t Cameron B., Couder M., Crouzet J. J. Bacteriol. 173:4893-4896(1991). 

16 \ 3] Crouzet J., Cameron B., Cauchois L., Rigault S., Rouyez M>C Blanche F., Thibaut D., Debussche L. J. Bacteriol. 
172*5980-5990(1990). [ 4] Roth J.R., Lawrence J.G., Rubenlield M., Kietfer-Higgins S., Church G.M. J. Bacteriol. 175: 
3303-331 6(1 993).[ 5] Mattheakis L.C., Shen W.H., Collier R.J. Mol. Cell. Biol. 12:4026-4037(1992). 
[1528] 646, Tudor domain 

Domain ol unknown lunclion present in several RNA-binding proteins, copies in the Drosophila Tudor protein. Slight 
20 ambiguities in the alignment.Number of members: 18 

[1]Modlino: 97200561 Tudor domains in proteins that interact with RNA. Ponting CP; Trends Biochem Set 1997;22: 

51-52 !2]Modlino: 97157029 Tho human EBNA-2 coactivalor plOO: multidomain organization and relationship to the 

staphylococcal nuclease fold and to the tudor protein involved in Drosophila melanogaster development. Cailebaut I, 

Mornon JP; Biochem J 1997;321:125-132. 
25 [1529] 647. Terpene synthase family 

It has been suggested that this gene family be designated tps (for terpene synthase) [1]. It has been split into six 

subgroups on the basis of phytogeny, called tpsa-tpsf. Ipsa includes vetispiridiene synthase Swiss:Q39979, 5-epi- 

uristolochono synthase Swiss:Q40577 and (■♦ )-dolla-cadinono synthase Swiss: P93665. 

tpsb includes (-)-limonene synthase, SwiSs:Q40322. 
30 tpsc includes kaurene synthase A, Swiss:004408. 

tpsd includes taxadiene synthase, Swiss:Q41594, pinene synthase, 

Swiss:024475 and myrcene synthase, Swiss:024474. 

tpse includes kaurene synthase B. 

tpsf includes linalool synthase. 
35 Number of members: 51 

ID 

Medline: 9741 3772 ■ 
Monoterpene synthases Irom grand fir (Abies grandis). cDNA isolation, characterization, and functional expression of 
myrcene synthase, (-)-(4S)-limonene synthase, and (-)-(1S,5S)-pinene synthase. 
40 Bohlmann J, Steele CL, Croteau R; 

J Biol Chem 1997;272:21784-21792. 
[1530] 648. ThiF family 

This lamily contains a repeated domain in ubiquitin activating enzyme E1 and members of the bacterial 
ThiF/MoeB/HesA family. Number of members: 87 
45 [1531] 649. Thioosler dehydrase 

Members of this family are involved in fatty acid biosynthesis. 
Number of members: 19 
[1] 

Medline: 96398612 

50 Structuro of a dehydratase-isomerase from the bacterial pathway for biosynthesis of unsaturated fatty acids: two cat- 
alytic activities in one active site. 

Leesong M, Henderson BS, Gillig JR, Schwab JM, Smith JL; 
^ Structure 1996;4:253-264. 

Database Reference: SCOP; 1mka; fa; [SCOP-USA] [CATH-PDBSUM] 
55 Database reference: PFAMB; PB058036; 
[1532] 650. Tub family signatures 

The mouse tubby mutation is the cause of maturity-onset obesity, insulin resistance and sensory deficits. This mutation 
maps to a gene, tub [1 ,2],which codes for a protein that belongs to a family which currently consists of the following 
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members: - Mammalian tub, an hydrophilic protein of about 500 residues, which could be involved in the hypothalamic 
regulation of body weight. - Human protein TULP1 [3] which may be involved in retinis pigmentosa 14, a retinal de- 
generation disease. - Mouse protein p4-6 whose function is not known. - Caenorhabditis elegans hypothetical, protein 
F10B5.4. - Several fragmentary sequences from plants, Drosophila and human ESTs. While the N-terminal part of 
5 these protein is not conserved in length nor in the sequence, the C-terminal 250 residues are highly conserved. There- 
fore, two regions were selected in the C-terminal part as signature patterns.' The secondr egion is located at the C- 
terminal extremity and contains a penultimate cysteine residue that could be critical to the normal functioning of these 
proteins. 

Consensus pattern: F-[KHQ)-G-R-V-[ST]-x-A-S-V-K-N-F-Q 

10 Consensus pattern: A-F-[AG]-l-ISACHUVM]-[ST)-S-F-x-[GST]-K-x-A-C-E 

[ 1] Kleyn P.W., Fan W., Kovats S.G., Lee J.L., Pulido J.C., Wu Y, Berkemeier L.R., Misumi D.J., Holmgren L, Charlat 
O., Woolf E.A., Tayber O., Brody T, Shu P., Hawkins F., Kennedy B., Baldini L, Ebeling C, Alperin G.D., Deeds J., 
Lakey N.D., Culpepper J., Chen H., Gluecksmann-Kuis M.A., Carlson G.A., Duyk G.M., Moore K.J. Cell 85:281 -290 
(1996).[ 2] Noben-Trauth K., NaggertJ.K., North M.A., Nishina P.M. Nature 380:534-538(1 996).[ 3] North M.A., Naggert 

is J.K., Yan Y, Noben-Trauth K., Nishina P.M. Proc. Natl. Acad. Sci. U.S.A. 94:3128-3133(1997). 
[1 533] 651 . Eukaryotic DNA topoisomerase I active site 

DNA topoisomerase I (EC 5.99.1.2 ) [1,2,3,4,EJJ is one of the two types of enzyme that catalyze the interconversion 
of topological DNA isomers. Type Itopoisomerases act by catalyzing the transient breakage of DNA, one strand at a 
time, and the subsequent rejoining of the strands. When a eukaryotic type Itopoisom erase breaks a DNA backbone 

20 bond, it simultaneously forms a protein-DNA link where the hydroxyl group of a tyrosine residue is joined to a 3'- 
phosphate on DNA, at one end of the enzyme-severed DNA strand. In eukaryotes and pox virus topoisomerases I, 
there are a number of conserved residues in the region around the active site tyrosine. 
Consensus pattern: [DEN]-x(6)-[GS]-[IT]-S-K-x(2)-Y-[LIVM]-x(3)-[LI VM] [Y is the active site tyrosine) 
[ 1] Sternglanz R. Curr. Opin. Cell Biol. 1 :533-535(1990).[2] Sharma A., Mondragon A. Curr. Opin. Struct. Biol. 5:39-47 

25 (1995). [3] Lynn R.M., Bjornsti M.-A., Caron PR., Wang J.C. Proc. Natl. Acad. Sci. U.S.A. 86:3559-3563(1 989). [ 4] 
Roca J. Trends Biochem. Sci. 20:156-160(1995). |E1] 
[1534] 652. Transaldolase signatures 

Transaldolase (EC 2.2.1.2 ) catalyzes the reversible transfer of a three-carbonketol unit from sedoheptulose 7-phos- 
phate to glyceraldehyde 3-phosphate to form erythrose 4-phosphate and fructose 6-phosphate. This enzyme, together 

so with transketolase, provides a link between the glycolytic and pentose-phosphate pathways. Transaldolase is an en- 
zyme of about 34 Kd whose sequence has been well conserved throughout evolution. A lysine has been implicated 
|1]in the catalytic mechanism of the enzyme; it acts as a nucleophilic group that attacks the carbonyl group of fructoso- 
6-phosphate.Transaldolase is evolutionary related [2] to a bacterial protein of about 20Kd (known as talC in Escherichia 
coli), whose exact function is not yet known. Two signature patterns have been developed for these proteins. The first, 

35 located in the N-terminal section, contains a perfectly conserved pentapeptide; these cond, includes the active site 
lysine. 

Consensus pattern: [DG]-[IVSA]-T-[ST]-N-P-[STA]-[LIVMF](2) 

Consensus pattern: [LIVM]-x-[LIVM]-K-[LIVM]-[PAS]-x-[STJ-x-[DENQPAS]-G-[LIVM]-x-[AGV]-x-[OEKRST]-x-[LIVM] 
[K is the active site residue] 

40 [ 1] Miosga T, Schaaff-Gerstenschlaeger I., Franken E., Zimmermann F.K. Yeast 9:1241-1249(1993). [ 2] Reizer J., 
Reizer A., Saier M.H. Jr. Microbiology 141:961-971(1995). 
[1535] 653. (Transpeptidase) Penicillin binding protein transpeptidase domain 

[1536] The active, site serine (residue 337 in Swiss: P 14677 ) is conserved in all members of this family. 
[1537] [1] Pares S, Mouz N, Petillot Y, Hakenbeck R, Dideberg O Nat Struct Biol 1996;3:284-289. 

45 [1538] 654. Trehalase signatures 

Trehalase (EC 3.2.1.28 ) is the enzyme responsible for the degradation of the disaccharide alpha, alpha-trohalose 
yielding two glucose subunits [1]. It is an enzyme found in a wide variety of organisms and whose sequence has been 
highly conserved throughout evolution. Two of the most highly conserved regions have been selected as signature 
patterns. The first pattern is located in the central section, the second one is in the C-terminal region. Consensus 

50 pattern: P-G-G-R-F-x-E-x-Y-x-W-D-x-Y 

Consensus pattern: Q-W-D-x-P-x-[GA]-W-[PAS]-P 

[ 1] Kopp M., Mueller H., Holzer H. J. Biol. Chem. 268:4766-4774(1 993).[ 2] Henrissat B., Bairoch A. Biochem. J. 293: 
781 -788(1 993).[E1] 

[1539] 655. Trehalose-6-phosphate synthase domain 
55 [1540] OtsA (Trehalose-6-phosphate synthase) Is homologous to regions In the subunits of yoasl lrohHloso-6-phos- 
phate synthase/phosphate complex, [1]. 1 
[1541] [1] Kaasen I, McDougall J, Strom AR; Gene 1994;145:9-15. 
[1542] 656. Tropomyosins signature 
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Tropomyosins [1 ,2] are family of closely related proteins present in muscle and non-muscle cells. In striated muscle, 
tropomyosin mediate the interactions between the troponin complex and actin so as to regulate muscle contraction. 
The role Of tropomyosin in smooth muscle and non-muscle tissues is not clear. Tropomyosin is an alpha-helical protein 
Ihnt lormo n coilod-coil dimor. Musclo isoforms of tropomyosin are, characterized by having 284 amino acid residues 
r» nnd a highly concorvod N-lormlnal rogion, whoroas non-musclo forms aro gonorally smaller and are heterogeneous 
in iheir N-terminal region. The signature pattern for tropomyosins is based on a very conserved region in the C-terminal 
section of tropomyosins and which is present in both muscle and non-muscle forms. 
Consensus pattern: L-K-E-A-E-x-R-A-E 

[ 1] Smilie LB. Trends Biochem. Sci. 4:151-155<1979).[ 2] McLeod A.R. BioEssays 6:208-212(1986). 
to [1643] 657. Troponin 

Troponin (Tn) contains three subunits, Ca2+ binding (TnC), inhibitory (Tnl), and tropomyosin binding (TnT). this Pfam 
contains members of the TnT subunit. 

Troponin is a complex of three proteins, Ca2+ binding (TnC), inhibitory (Tnl), and tropomyosin binding (TnT). 
The troponin complex regulates Ca++ induced muscle contraction. 
1$ This family includes troponin T and troponin I. Troponin I binds to actin and troponin T binds to tropomyosin. 
Number of mombors; 81 [1] 
Medline: 87144593 

Structure of co-crystals of tropomyosin and troponin. 
White SP, Cohen C, Phillips GN Jr; 
20 Nature 1987;325:826-828. [2] 
Medline: 95155315 

A dirocl rogulalory rolo for troponin T and a dual roio for 

troponin C in the Ca2+ regulation of muscle contraction. 

Potter JD, Sheng Z, Pan BS, Zhao J; 
25 J Biol Chem 1995;270:2557-2562. 

[3]Medline: 95324796 

The troponin complex and regulation of muscle contraction. 

Farah CS, Reinach FC; 

FASEB J 1 995;9:755-767. 
30 [1544] 658. (Tryp mucin) Mucin-like glycoprotein 

[1546] This family of trypanosomal proloins resemble vertebrate mucins. The protein consists of three regions. The 

N and C terminii are conserved between all members of the family, whereas the central region is not well conserved 

and contains a large number of threonine residues which can be glycosylated [1]. 

Indirect evidence suggested that these genes might encode the core protein of parasite mucins, glycoproteins that 
35 were proposed to be involved in the interaction with, and invasion of, mammalian host cells. 

[1] Di Noia JM, Sanchez DO, Frasch AC; J Biol Chem 1995;270:24146-24149. 

[2] Di Noia JM, D'Orso I, Aslund L, Sanchez DO, Frasch AC; J Biol Chem 1998;273:10843-10850. 

40 [1546] 659. Aminoacyl-transfer RNA synthetases class-l signature (tRNA synt 1) 

Aminoacyl-tRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate amino acids and transfer them 
to specific tRNA molecules as the first step in protein biosynthesis. In prokaryotic organisms there are at least twenty 
different types of aminoacyl-tRNA synthetases, one for each differentamino acid. In eukaryotes there are generally 
two aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a mitochondrial form. While all 

as these enzymes have a common function, they are widely diverse interms of subunit size and of quaternary structure. 
A few years ago it was found [2] that several aminoacyl-tRNA synthetases share a region of similarity in their N-terminal 
section, in particular the consensus tetrapeptide His-lle-Gly-His ('HIGH) is very well conserved. The 'HIGH* region has 
been shown [3] to be part of the adenylate binding site. The 'HIGH' signature has been found in the aminoacyl-tRNA 
synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryp- 

50 tophan, and valine. These aminoacyl-tRNA synthetases are referred to as class-l synthetases [4,5,6] and seem to 
share the same tertiarystruclure based on a Rossmann fold. Consensus pattern: P-x(0,2)-[GSTAN]-[DENQGAPK]-x- 
[LIVMFP]-[HT]-[LIVMYAC]-G-[HNTG]-[LIVMFYSTAGPC] 

[ 1] Schimmel P. Annu. Rev. Biochem. 56: 125-1 58(1 987).[ 2] Webster T., Tsai H., Kula M., Mackie G.A., Schimmel P. 
Science 226:1315-1317(1984).[ 3] Brick P., Bhat T.N., Blow D.M. J. Mol. Biol. 208:83-98(1 988).[ 4] Delarue M. t Moras 
ss D. BioEssays 15:675-687(1 993).[ 5] Schimmel P. Trends Biochem. Sci. 16:1-3(1991 ).[ 6] Nagel G.M., Doolittle R.F. 
Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991). 

[1547] 660. Aminoacyl-transfer RNA synthetases class-l signature (tRNA synt 1b) 

Aminoacyl-tRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activat amino acids and transfer them 
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to specific tRNA molecules as the first step in protein biosynthesis. In prokaryotic organisms there are at least tw nty 
different types of aminoacyMRNA synthetases, one for each different amino acid. In eukaryotes there.are generally 
two aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a mitochondrial form. While all 
these enzymes have a common function, they are widely diverse in terms of subunit size and of quaternary structure. 

5 A few years ago it was found [2] that several aminoacyl-tRNA synthetases share a region of similarity in their N-terminal , 
section, in particular the consensus tetrapeptide His-lle-Gly-His ('HIGH') is very well conserved. The 'HIGH'region has 
been shown [3] to be part of the adenylate binding site. The 'HIGH* signature has been found in the aminoacyMRNA 
synthetases specific forarginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryp- 
tophan, and valine. These aminoacyl-tRNA synthetases are referred to as class-l synthetases [4,5,6] and seem to 

io share the same tertiary structure based on a Rossmann fold. Consensus pattern: P-x(0,2)-[GSTAN]-[DENQGAPK]-x- 
[LIVMFP]-[HT]-[LIVMYAC]-G-[HNTG]-[LIVMFYSTAGPC 

[ 1] Schimmel R Annu. Rev. Biochem. 56: 125-1 58(1 987).[ 2] Webster T, Tsai H., Kula M., Mackie G.A., Schimmel R 
Science 226:1315-1317(1984).[ 3) Brick R, BhatT.N., Blow D.M. J. Mol. Biol. 208:83-98(1 988).[ 4] Delarue M., Moras 
D. BioEssays 1 5:675-687(1 993).[ 5] Schimmel P. Trends Biochem. Sci. 16:1-3(1991). [ 6] Nagel G.M., Doolittle R.F 
is Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991). 

[1548] 661 . (tRNA-synt 1C) tRNA synthetases class I (E and Q) 

[1549] Other tRNA synthetase sub-families are too dissimilar to be included. , 
This family includes only glutamyl and glutaminyl tRNA synthetases. 

In some organisms, a single glutamyl-tRNA synthetase aminoacylates both tRNA(Glu) and tRNA(Gln). 
20 [1550] [1] Rath VL, Silvian LF, Beijer B, Sproat BS t Steitz TA; Structure 1998;6:439-449. 
[1551] 662. (tRNA-synt 1d) tRNA synthetases class I (R) 
[1552] Other tRNA synthetase sub-families are too dissimilar to be included. 
This family includes only arginyl tRNA synthetase. 

[1553] 663. AminoacyMransfer RNA synthetases class-ll signatures (tRNA synt 2) 

2S Aminoacyl-tRNA synthetases (EC 6.1.1.-) (1] are a group of enzymes which activate amino acids and transfer them 
to specific tRNA molecules as the first step in protein biosynthesis. In prokaryotic organisms there are at least twenty 
different types of aminoacyl-tRNA synthetases, one for each different amino acid. In eukaryotes there are generally 
two aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a mitochondrial form. While all 
these enzymes have a common function, they are widely diverse interms of subunit size and of quaternary structure. 

30 The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, 
and threonine are referred to as class-ll synthetases [2 to 6] and probably have a common folding pattern in their 
catalytic domain for the binding of ATP and amino acid which is different to the Rossmann fold observed for the class 
I synthetases [7].CIass-ll tRNA synthetases do not share a high degree of similarity, however at least three conserved 
regions are present [2,5,8]. Signature patterns have been derived from two of these regions. 

35 Consensus pattern: [FYH]-R-x-[DE]-x(4,12)-[RH]-x(3)-F-x(3HDE 

Consensus pattern: [GSTALVF]-{DENQHRKP}-[GSTA]-[LIVMFHDE]-R-[LIVMF]-x-[LIVMSTAG]-[LIVMFY] 
[ 1] Schimmel P. Annu. Rev. Biochem. 56:1 25-1 58(1 987).[ 2] Delarue M. ( Moras D. BioEssays 1 5:675-687(1 993).[ 3] 
Schimmel P. Trends Biochem. Sci. 16:1-3(1991). [ 4] Nagel G.M., Doolittle R.F. Proc. Natl. Acad. Sci. U.S.A. 88: 
8121-8125(1991). [ 5] Cusack S., Haertlein M., Leberman R. Nucleic Acids Res. 19:3489-3498(1 991 ).[ 6] Cusack S. 

40 Biochimie 75:1077-1081(1993).[ 7] Cusack S. t Berthet-Colominas C, Haertlein M., Nassar N. t Leberman R. Nature 
347:249-255(1 990). [ 8] Leveque R, Plateau P., Dessen P., Blanquet S. Nucleic Acids Res. 18:305-312(1990). 
[1554] 664. Aminoacyl-transfer RNA synthetases class-l signature (tRNA synt 1e) 

Aminoacyl-tRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate amino acids and transfer them 
to specific tRNA molecules as the first step in protein biosynthesis. In prokaryotic organisms there are at loast twonty 

45 diflerent types of aminoacyl-tRNA synthetases, one for each different amino acid. In oukuryotos thoro aro gonorally 
two aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a mitochondrial form. While all 
these enzymes have a common function, they are widely diverse in terms of subunit size and of quaternary structure. 
A few years ago it was found [2] that several aminoacyl-tRNA synthetases share a region of similarity in their N-terminal 
section, in particular the consensus tetrapeptide His-lle-Gly-His ('HIGH') is very well conserved. The 'HIGH' region has 

so been shown [3] to be part of the adenylate binding site. The 'HIGH' signaturo has been found in tho aminoacyl-tRNA 
synthetases specific forarginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryp- 
tophan, and valine. These aminoacyl-tRNA synthetases are referred to as class-l synthetases [4,5,6] and seem to 
share the same tertiary structure based on a Rossmann fold. 

Consensus pattern: P-x(0,2)-[GSTAN]-[DENOGAPK]-x-[LIVMFP]-[HT]-[LIVMYAC]-G-[HNTG]-[LIVMFYSTAGPC 
55 [ 1] Schimmel R Annu. Rev. Biochem. 56: 125-1 58(1 987).[ 2] Webster T. ( Tsai H., Kula M., Mackie G.A., Schimmel P. 
Scionco 226:1 315-1 31 7(1 984),( 3] Brick P., B\m\ T.N., Blow D.M, J, Mol, Biol. 200:03-00(1 000) f | A] Outrun M„ Merits 
D. BioEssays 1 5:675-687(1 993).[ 5] Schimmel P. Trends Biochem. Sci. 16:1-3(1991).[ 6] Nagel G.M., Doolittle R.F. 
Proc. Natl. Acad Sci, U.S.A. 88:8121-8125(1991). 
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[1555] 665 Aminoacyl-transfer RNA synthetases class.-ll signatures (tRNA synl 2b) 

Aminoacyl-tRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate amino acids and transfer them 
to specific tRNA molecules as the first step in protein biosynthesis. In prokaryotic organisms there are at least twenty 
dilloront lypoc of nminoncyMRNA synlholnsos, ono for oach difforont amino acid. In oukaryotes there are generally 

5 two aminoacyl-tRNA synthetases for each difforont amino acid: one cytosolic form and a mitochondrial form. While all 
these enzymes have a common (unction, they are widely diverse interms of subunil size and of quaternary structure. 
The synthetases specific for alanine, asparagine, asparlic acid, glycine, histidine, lysine, phenylalanine, proline, serine, 
and threonino are referred to as class-ll synthetases [2 to 6] and probably have a common folding pattern in. their 
catalytic domain for the binding of ATP and amino acid which is diflerent to the Rossmann fold observed for the class 

w | synthetases [7].CIass-ll tRNA synthetases do not share a high degree of similarity, however at least three conserved 
regions are present [2,5,8]. Signature patterns have been derived from two of these regions. 
Consensus pattern: [FYH]-R-x-[DE]-x(4,12)-[RH]-x(3)-F-x(3)-[DE 

Consensus pattern: [GSTALVF]MDENQHRKP}-[GSTA]-[LIVMF]-[DE]-R-[LIVMF]-x-[LIVMSTAG]-[LIVMFY] 

1 !] schimmel P Annu. Rev. Biochem. 56:1 25-1 58(1 987).[ 2] Delarue M., Moras D. BioEssays 1 5:675-687(1 993).[ 3] 
is Schimmel P. Trends Biochem. Sci. 16:1-3(1991)4 4] Nagel G.M.. Doolittle R.R Proc. Natl. Acad. Sci. U.S.A. 88: 

8121-8125(1991). | 5] Cusack S., Haertlein M. ( Leberman R. Nucleic Acids Res. 1 9:3489-3498(1991 ).[ 6] Cusack S. 
Biochimie 75:1077-1081 (1993)-! 7] Cusack S., Borthet-Colominas C, Haortloin M., Nassar Nl, Leberman R. Nature 
347:249-255(1990). [ 8] Leveque F., Plateau P., Dessen P., Blanquet S. Nucleic Acids Res. 18:305-312(1990). 
[1556] 666. Thaumatin family signature 
20 Thaumatin [1] is an intensively sweet-tasting protein (100 000 times sweeter than sucrose on a molar basis) from 
Thaumatococcus daniellii, an African brush. The protein is made of about 200 residues and contains 8 disulfide bonds. 
A numbor of piolnlno huvo boon touncJ lo bo rolntod to thaumatins. Those protoin are listed bolow (references are only 
provided lor recently determined sequences). - A maize alpha-amylase/trypsin inhibitor. - Two tobacco pathogenesis- 
related proteins: PR-R major and minor forms, which are induced after infection with viruses. - Salt-induced protein 
26 NP24 from tomato - Osmotin, a salt-induced protein from tobacco. - Osmotin-like proteins OSML13, OSML15 and 
OSML81 from potato [2]. - P21 , a leaf protein from soybean. - PW1R2, a leaf protein from wheat. - Zeamatin, a maize 
anlifunal protein [3] The exact biological function of all these proteins is not yet known. A conserved region that includes 
throo cystoino rosiduos known (in thaumatin) to be involved in disulfide bonds has been selected as a signature pattern. 
+ +I+ + || ••••••• in 

xxCxxxxxxxxxxxxxxxxCxxCxxCxCxxxxxxxxxxxxxxCxxCxCxxxCxCxxCCxCxxxCxxxxx CxxxCxIlllllllllll +-+ I +--+ 

+--++-+ | + +'C: conserved cysteine involved in a disulfide bond/*': position of the pattern. 

Consensus pattern: G-x-[GF]-x-C-x-T-[GA]-D-C-x(1,2)-G-x(2,3)-C ^ 
[ 11 Edens L, Heslinga L, Klok P., Ledeboer A.M., Maat J., Toonen M.Y., Visser C, Vemps C.T Gene 18:1-12(1 982). 
[ 2] Zhu B., Chon T.H.H., LI PH. Plant Physiol. 10B:929-937(1995).[ 3] Malohorn D.E., Borgmeyer J.R., Smith C.E., 
35 Shah D.M.; Plant Physiol. 106:1471-1481(1994). 
[1557] 667. Thiolases signatures 

Two different types of thiolase [1,2,3] are found both in eukaryotes and in prokaryotes: acetoacetyl-CoA thiolase (EC 

2 3,1,9) and 3-ketoacyl-CoA thiolase(EC 2.3.1.16 ). 3-ketoacyl-CoA thiolase (also called thiolase I) has a broad cham- 
longth specificity for its substratos and is involved in dogradative pathways such as fatty acid beta-ox idation. Ace- 

40 toacetyl-CoA thiolase (also called thiolase II) is specilic for the thiolysis of acetoacetyl-CoA and involved in blosynthetic 
pathways such as poly beta-hydroxybutyrate synthesisor steroid biogenesis. In eukaryotes, there are two forms of 
3-ketoacyl-CoA thiolase: one located in the mitochondrion and the other in peroxisomes. There are two conserved 
cysteine residues important for thiolase activity. The first located in the N-terminal section of the enzymes is involved 
in the formation of an acyl-enzyme intermediate; the second located at the C-terminal extremity is the active site base 

45 involvod in doprotonation in tho condensation roaction. Mammalian nonspecific lipid-transfer protein (nsL-TP) (also 
known as sterol carrier protein 2) is a protein which seems to exist in two diflerent forms: a 14 Kd protein (SCP-2) and 
a larger 58 Kd protein (SCP-x). The former is found in the cytoplasm or the mitochondria and is involved in lipid transport; 
the latter is found in peroxisomes. 

The C-terminal part of SCP-x is identical to SCP-2 while the N-terminal portion is evolutionary related to thiolases[4]. 
so Three signature patterns have been developed for this family of proteins, two of which are based on the regions around 
the biologically important cysteines. The third is based on a highly conserved region in the C-terminal part of these 

proteins. . 
Consensus pattern: [LIVM)-[NST]-x(2)-C-[SAGLIHST]-[SAG]-[LI VMFYNS]-x-[STAG]-[LI VM]-x(6)-[LI VM] [C is involved 

in formation of acyl-enzyme intermediate] 
55 Consensus pattern: N-x(2)-G-G-x-[LIVM]-[SA]-x-G-H-P-x-[GA]-x-[ST]-G 

Consensus pattern: [AG]-[LI VMA]-[STAGCLI VM]-[STAG]-[LI VMA]-C-x-[AG]-x-[AG]-x- [AG]-x-[SAG] [C is the active s.te 

residue] . _ 

[ 1] Peoples O.P., Sinskey AJ. J. Biol. Chem. 264: 15293-1 5297(1 989).[ 2] Yang S.-Y, Yang X.-Y.H., Healy-Loui G.. 
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Schulz H., Elzinga M. J. Biol. Chem. 265:10424-10429(1990).[ 3] Igual J.C., Gonzalez-Bosch C., Dopazo J., Perez- 
Ortin J.E. J. Mol. Evol. 35:147-155(1 992).[ 4] Baker M.E., Billheimer J.T., Strauss J.R III DNA Cell Biol. 10:695-698 
(1991). 

[1558] 668. Thioredoxin family active site 
5 Thioredoxins [1 to 4] are small proteins of approximately one hundred amino-acid residues which participate in various 
redox reactions via the reversible oxidation of an active center disulfide bond. They exist in either a reduced form or 
an oxidized form where the two cysteine residues are linked in an intramolecular disulfide bond. Thioredoxin is present 
in prokaryotes and eukaryotes and the sequence around the redox-active disulfide bond is wellconserved. Bacteri- 
ophage T4 also encodes for a thioredoxin but its primary structure is not homologous to bacterial, plant and vertebrate 

io thioredoxins. A number of eukaryotic proteins contain domains evolutionary related tothioredoxin, all of them seem to 
be protein disulphide isomerases (PDI). PDI(EC 5.3.4.1 ) [5,6,7] is an endoplasmic reticulum enzyme that catalyzes 
the rearrangement of disulfide bonds in various proteins. The various forms of PDI which are currently known are: - 
PDI major isozyme; a multifunctional protein that also function as the beta subunit of prolyl 4-hydroxylase (EC 
1.14.11,2 ), as a component of oligosaccharyl transferase (EC 2.4.1.119 ), as thyroxine deiodinase (EC 3.8. 1.4), as 

15 glutathione-insulin transhydrogenase (EC 1.8.4.2 ) and as a thyroid hormono-binding protein ! - ERp60 (ER-60; 58 Kd 
microsomal protein). ERp60 was originally thought to be a phosphoinositide-specific phospholipase C isozyme and 
later to be a protease. - ERp72. - P5. All PDI contains two or three (ERp72) copies of the thioredoxin domain. Bacterial 
proteins that act as thiol:disu!fide interchange proteins thatallows disulfide bond formation in some periplasmic proteins 
also contain a thioredoxin domain. These proteins are: - Escherichia coli dsbA (or prfA) and its orthologs in Vibrio 

20 cholerae (tcpG) and Haemophilus influenzae (por). - Escherichia coli dsbC (or xpRA) and its orthologs in Erwinia 
chrysanthemi and Haemophilus influenzae ■ Escherichia coli dsbD (or dipZ) and its Haemophilus influonzao ortholog. 
- Escherichia coli dsbE (or ccmG) and orthologs in Haemophilus influenzae, Rhodobacter capsulatus (helX), Rhizio- 
biacae (cycY andtlpA). Consensus pattern: [LIVMF]-[LIVMSTA]-x-[UVMFYC]-[FYWSTHE]-x(2)- [FYWGTNJ-C- [GAT- 
PLVE]-[PHYWSTA]-C-x(6)-[LIVMFYWT] [The two C's form the redox-active bond] 

25 [i] Holmgren A. Annu. Rev. Biochem. 54:237-271 (1985).[ 2] Gleason RK., Holmgren A. FEMS Microbiol. Rev. 54: 
271-297(1988). [ 3] Holmgren A. J. Biol, Chem. 264: 13963-1 3966(1 989). [ 4] Eklund H., Gleason F.K., Holmgren A. 
Proteins 11:1 3-28(1 991). | 5] Freedman R.B., Hawkins H.C., MurantS.J., ReidL. Biochem. Soc. Trans. 16:96-99(1988). 
[ 6] Kivirikko K.I., Myllyla R., Pihlajaniemi T. FASEB J. 3:1609-1617(1989). [ 7] Freedman R.B., Hirst T.R., Tuite M.F. 
Trends Biochem. Sci. 19:331-336(1994). 

30 [1559] 669. (Transcript fac2) Transcription factor TFIIB repeat signature 

In eukaryotes the initiation of transcription of protein encoding genes by polymerase II is modulated by general and 
specific transcription factors. The general transcription factors operate through common promoters elements (such as 
the TATA box). At least seven different proteins associates to form the general transcription factors: TFIIA, -MB, -IID, - 
HE, -IIF, -IIG, and -IIH[1]. Transcription factor MB (TFIIB) plays a central role in the transcription of class II genes, it 

35 associates with a complex of TFIID-IIA bound to DNA (DA complex) to form a ternary complex TFIID-IIA-IBB (DAB 
complex) which is then recognized by RNA polymerase II [2,3]. TFIIB is a protein of about 315 to 340amino acid 
residues which contains, in its C-terminal part an imperfect repeat of a domain of about 75 residues. This repeat could 
contribute an element of symmetry to tho foldod protoln. The following protoins havo boon shown to bo evolutionary 
related to TFIIB: - An archaebacterial TFIIB homolog. In Pyrococcus woesei a previously undetected open reading 

40 frame has been shown [4] to be highly related to TFIIB. - Funga! transcription factor 1MB 70 Kd subunit (gene 
PCF4/TDS4/BRF 1 ) [5]. This protein is a general activator of RNA polymerase III transcription and plays a role analogous 
to that of TFIIB in pol III transcription. The central section of the repeated domain, which is the most conserved part of 
that domain has been selected as a signature pattern. 

Consensus pattern: G-[KR]-x(3)- [STAGN]-x-[LIVMYA]-[GSTA](2)-[CSAVHLIVM]-[LIVMFY]-[LIVMA]-[GSA]-[STAC 
45 [ 1] Weinmann R. Gene Expr. 2:81-91(1992).[ 2] Hawley D. Trends Biochem. Sci. 16:317-318(1991).[ 3] Ha I., Lane 
W.S., Reinberg D. Nature 352:689-695(1 991 ).[ 4]Ouzounis C, Sander C. Coll 71:189-190(1992).! 51 Khoo B., Brophy 
B., Jackson S.P. Genes Dev. 8:2879-2890(1994). 
[1560] 670. (transcritp fact) MADS-box domain signature and profile 

A number of transcription factors contain a conserved domain of 56 amino-acid residues, sometimes known as the 
50 MADS-box domain [El]. They are listed below: - Serum response factor (SRF) [1], a mammalian transcription factor 
that binds to the Serum Response Element (SRE). This is a short sequence of dyad symmetry located 300 bp to the 
5'end of the transcription initiation site of genes such as c-fos. - Mammalian myocyto-spocific enhancer factors 2A to 
2D (MEF2A to MEF2D). These proteins are transcription factor which binds specifically to the MEF2 element present 
in the regulatory regions of many muscle-specific genes. - Drosophila myocyte-specific enhancer factor 2 (MEF2). - 
55 Yeast GRM/PRTF protein (gene MCM1) [2], a transcriptional regulator of mating-typo-specific gonos. - Yoast arginino 
metabolism regulation protein I (gene ARGR1 or ARG80). - Yeast transcription factor RLM1. - Yeast transcription factor 
SMP1. - Arabidopsis thaliana agamous protein (AG) [3], a probable transcription factor involved in regulating gonos 
that determines stamen and carpel development in wild-type flowers. Mutations in the AG gene result in the replacement 
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of the stamens by petals and the carpels by a new flower. - Arabidopsis thaliana homeotic proteins Apetalal (AP1), 
ApetalaS (AP3) and Pistillata (PI) which act locally to specify the identity of the floral meristem and to determine sepal 
and petal development [4]. - Antirrhinum majus and tobacco homeotic protein deficiens (DEFA) and globosa (GLO) 
|5l. Both proteins aro transcription factors involvod in the genetic control of flower development. Mutations in DEFA or 

r> C1LO ci*u\a thu tmnnfonniition ol polnto into copula and of stamina into carpols. - Arabidopsis thaliana putative tran- 
scription factors AGU to AGL6 [6J. - Antirrhinum majus morphogenolic protein DEF H33 (squamosa). In SRF, the 
conserved domain has been shown [1] to be involved in DNA-binding and dimerization. A pattern that spans the com- 
plete length of the domain has been derived. The profile also spans the length of the MADS-box. 
Consensus pattern: R-x-[RK)-x(5)-l-x-[DNGSK]-x(3)-[KR]-x(2)^T-[FY]-x-[RKJ(3)- x(2)-[LIVM]-x-K(2)-A-x-E-[LIVM]- 

io |3TA)-x-L-x(4HLIVM]-x- |LIVM](3)-x(6)-|LI VMF]-x(2)-[FY] 

[ 1] Norman C, Runswick M., Pollock R., Troisman R. Cell 55:989-10030 988). I 2] Passmore S., Maine G.T., Elble R., 
Christ C, Tye B.-K. J. Mol. Biol. 204:593-606(1 988).[ 3] Yanofsky M., Ma H.. Bowman J., Drews G.. Feldmann K.A., 
Meyerowitz E.M. Nature 346:35-39(1 990).[ 4] Goto K., Meyerowitz E.M. Genes Dev. 8: 1548-1560(1 994). [ 5) Troebner 
W. f Ramirez L, Motte P. ( Hue I., Huijser P., Loennig W.-E,, Saedler H., Sommer H„ Schwartz-Sommer 2. EMBO J. 

is 11:4693-4704(1 992). [ 6] Ma H., Yanofsky M.F., Meyerowitz E.M. Genes Dev. 5:484-495(1991). [E1] 
[1661] G71. Tranokololaso signatures 

Transkotolaso (EC 2.2.1.1 ) (TK) catalyzos tho rovorsiblo transfor ol a two-carbon kotol unit from xylulose 5-phosphate 
to an aldose receptor, such as ribose 5-phosphate ( to form sedoheptulose 7-phosphate and glyceraldehyde 3-phos- 
phalo. This enzyme, together with transaldolase, provides a link between the glycolytic and pentose-phosphate path- 

20 ways, TK requires thiamin pyrophosphate as a cofactor. In most sources where TK has been purified, it is a homodimer 
of approximately 70 Kd subunits. TK sequences from a variety of eukaryotic and prokaryotic sources [1,2] show that 
Iho onzymo has boon ovolutionarily consorvod. In the poroxisomos ot methylotrophic yoast Hansonula polymorpha, 
there is a highly related enzyme, dihydroxy-acetone synthase (DHAS) (EC 2.2.1.3 ) (also known as formaldehyde tran- 
sketolase), which exhibits a very unusual specificity by including formaldehyde amongst its substrates. 1-deoxyxylu- 

25 iose-5-phosphate synthase (DXP synthase) [3] is an enzyme so far found in bacteria (gene dxs) and plants (gene 
CLA1) which catalyzes the thiamin pyrophosphoate-dependent acyloin condensation reaction between carbon atoms 
2 and 3 of pyruvate and glyceraldehyde 3-phosphate to yield 1-deoxy-D- xylulose-5-phosphate (dxp), a precursor in 
tho biosynlhotic pathway to isopronoids, thiamin (vitamin B1 ), and pyridoxol (vitamin B6). DXP synthase is evolutionary 
related to TK. Two regions of TK have been selected as signature patterns. The first, located in the N-terminal section, 

30 contains a histidine residue which appears to function inproton transfer during catalysis [4]. The second, located in the 
contral section, contains conserved acidic residues that are part of the active cleft and may participate in substrate- 
binding [4). 

Consensus pattern: R-x(3)-[LIVMTA]-[DENQSTHKF]-x(5,6)-[GSN]-G-H-[PLIVMF]-[GSTA]-x(2)-[LIMC]-[GS 
Consensus pattern: G-[DEQGSA]-[DN]-G-IPAEQ]-[ST]-[HQ]-x-[PAGM]-[LIVMYAC]-[DEFYW]-x(2)-[STAP]-x(2)-[RGA] 

35 [ 1] Abedinia M., Layfield R., Jones S.M., Nixon P.F., Mattick J.S. Biochem. Biophys. Res. Commun. 183:1159-1166 
(1992). [2] Fletcher T.S., Kwee I.L, Nakada T, Largman C, Martin B.M. Biochemistry 31 :1 892-1896(1 992). [ 3] 
Sprenger G.A., Schorken U., Wiegert T. t Grolle S., De Graaf A.A., Taylor S.V., Begley T.P., Bringer-Meyer S., Sahm 
H. Proc. Natl. Acad. Sci. U.S.A. 94: 1 2857-1 2862(1 997U 41 Lindqvist Y, Schneider G.. Ermler U., Sundstroem M. EMBO 
J. 11:2373-2379(1992). 

40 [1562] 672. Transmembrane 4 family signature 

Recently a number of eukaryotic cell surface antigens have been found to be evolutionary related [1,2,3]. The proteins 
known to belong to this family are listed below: - Mammalian antigen CD9 (MIC3); A protein involved in platelet activation 
and aggregation. - Mammalian leukocyte antigen CD37, expressed on B lymphocytes. - Mammalian leukocyte antigen 
CD53 (OX-44), which may be involved in growth regulation in hematopoietic cells. - Mammalian lysosomal membrane 

45 protein CD63 (melanoma-associated antigen ME491; antigen AD1). - Mammalian antigen CD81 (cell surface protein 
TAPA-1), which may play an important role in the regulation of lymphoma cell growth. - Mammalian antigen CD82 
(protein R2; antigen C33; Kangai 1 (KAI1)), which associates with CD4 or CD8 and delivers costimulatory signals for 
the TCR/CD3 pathway. - Mammalian antigen CD151 (SFA-1 ; platelet-endothelial tetraspan antigen 3 (PETA-3)). - 
Mammalian cell surface glycoprotein A15 (TALL A- 1 ; MXS1). - Mammalian novel antigen 2 (NAG-2). - Human tumor- 

50 associated antigen CO-029. - Schistosoma mansoni and japonicum 23 Kd surface antigen (SM23 / SJ23).These pro- 
teins share the following characteristics: they all seem to be type III membrane proteins (type III proteins are integral 
membrane proteins that contain a N-terminal membrane-anchoring domain which is not cleaved during biosynthesis 
and which functions both as a translocation signal and as a membrane anchor); they also contain three additional 
transmembrane regions, at least seven conserved cysteines residues, and are of approximately the same size (218 

55 to 284 residues). These proteins are collectively know as the 'transmembrane 4 super family* (TM4) because they span 
tho plasma membrane four times. A schematic diagram of the domain structure of these proteins isshown below. +- 

+ + + .„_ + + + + +-— + II TMa I Extra I TM2I Cyt I TM3 I Extracellular I TM4 I 

Cyti +-+ + +— -C C—--+ CC C- — C- --+ C----+ ********* Cyt: cytoplasmic domain. TMa: 
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transmembrane anchor.TM2 to TM4: transmembrane regions 2 to 4.'C: conserved cysteine. '*' : position of the pattern. 
A conserved region that includes two cysteines and seems to be located in a short cytoplasmic loop between two 
transmembrane domains has been selected as a signature tor these proteins. i . 

Consensus pattern; G-x(3)-[LIVMF]-x(2)-[GSA]-ILIVMF](2)-G-C-x-[GA]-[STA]- x(2)-[EG]-x(2)-|CWN]-[LIVM](2) 
5 [ 1] Levy S., Nguyen V.Q., Andria M.L., Takahashi S. J. Biol. Chem. 266:14597-14602(1991). [ 2] Tomlinson M.G., Wil- 
liams A.F., Wright M.D. Eur J. Immunol. 23:1 36-40(1 993). [ 3] Barclay A.N., Birkeland M.L., Brown M.H., Beyers A.D., 
Davis S.J., Somoza C, Williams A.F. The leucocyte antigen factbooks. Academic Press, London / San Diego, (1993). 
[1563] 673. Tryptophan synthase alpha chain signature 

Tryptophan synthase catalyzes the last step in the biosynthesis of tryptophan: the conversion of indoleglycerol phos- 
io phate and serine, totryptophan and glyceraldehyde 3-phosphate [1,2]. It has two functional domains: one for the aldol 
cleavage of indoleglycerol phosphate to indole andglyceraldehyde 3-phosphate and the other for the synthesis of 
tryptophan fromindole and serine. In bacteria and plants [3], each domain is found on a separate subunit (alpha and 
beta chains), while in fungi the two domains are fused together on a single multifunctional protein. A conserved region 
that contains three conserved acidic residues has been selected as a signature pattern for the alpha chain. The first 
and the third acidic residues are believed to serve as proton donors/acceptors in the enzyme's catalytic mechanism. 
Consensus pattern: [LIVM)-E-[LIVMJ-G-x(2)-[FYC]-[ST)-[DE]-|PA)-[LI VMY]- [AGLI]-[DE]-G 

[ 1] Crawford LP. Annu." Rev. Microbiol. 43:567-600(1 989).[ 2] Hyde C.C., Miles E.W. Bio/Technology 8:27-32(1990). 
[ 3] Berlyn M.B., Last R.L., Fink G.R. Proc. Natl. Acad. Sci. U.S.A. 86:4604-4608(1989). 
[1564] 674. Tryptophan synthase beta chain pyridoxal-phosphate attachment site 

so Tryptophan synthase catalyzes the last step in the biosynthesis of tryptophan: the conversion of indoleglycerol phos- 
phate and serine, totryptophan and glyceraldehyde 3-phosphate [1,2]. It has two functional domains: one for the aldol 
cleavage of indoleglycerol phosphate to indole andglyceraldehyde 3-phosphate and the other for the synthesis of 
tryptophan fromindole and serine. In bacteria and plants [3], each domain is found on a separate subunit (alpha and 
beta chains), while in fungi the two domains arefused together on a single multifunctional protein. The beta chain of 

25 the enzyme requires pyridoxal-phosphate as a cofactor. The pyridoxal-phosphate group is attached to a lysine residue. 
The region around this lysine residue also contains two histidine residues which are part of the pyridoxal-phosphate 
binding site. The signature pattern for the tryptophansynthase beta chain is derived from that conserved region. 

Consensus pattern: [LIVM]-x-H-x-G-[STA]-H-K-x-N [K is the pyridoxal-P attachment site) 

30 

I 1] Crawford LP. Annu. Rev. Microbiol. 43:567-600(1 989). [ 2] Hyde C.C., Miles E.W. Bio/Technology 8:27-32(1990). 
[ 3] Berlyn M.B., Last R.L., Fink G.R. Proc. Natl. Acad. Sci. U.S.A. 86:4604-4608(1989). 
[1565] 675. Serine proteases, trypsin family, active sites 

The catalytic activity of the serine proteases from the trypsin family is provided by a charge relay system involving an 

35 aspartic acid residue hydrogen-bonded to a histidine, which itself is hydrogen-bonded to a serine. The sequences in 
the vicinity of the active site serine and histidine residues are well conserved in this family of proteases [1]. A partial 
list of proteases known to belong to the trypsin family is shown below. - Acrosin. - Blood coagulation factors VII, IX, X, 
XI and XII, thrombin, plasminogen, and protein C. - Cathepsin G. - Chymotrypsins. - Complement components C1r, 
Cls, C2, and complement factors B t D and I. - Complement-activating component of RA-roaclivo factor. - Cytotoxic 

to cell proteases (granzymes A to H). - Duodenase I. - Elastases 1, 2, 3A, 3B (protease E), leukocyte (medullasin). - 
Enterokinase (EC 3.4.21.9 ) (enteropeptidase). - Hepatocyte growth factor activator. - Hepsin. - Glandular (tissue) ka- 
llikreins (including EGF-binding protein types A, B, and C, NGF-gamma chain, gamma-renin, prostate specific antigen 
(PSA) and tonin). - Plasma kallikrein. - Mast cell proteases (MCP) 1 (chymase) to 8. - Myeloblasts (proteinase 3) 
(Wegener's autoantigen). - Plasminogen activators (urokinase-type, and tissue-type). - Trypsins I, II, III, and IV. - Tryp- 

45 tases. - Snake venom proteases such as ancrod, batroxobin, cerastobin, flavoxobin, and protein C activator. - Colla- 
genase from common cattle grub and collagenolytic protease from Atlantic sand fiddler crab. - Apolipoprotein(a). - 
Blood fluke cercarial protease. - Drosophila trypsin like proteases: alpha, easter, snake-locus. - Drosophila protease 
stubble (gene sb). - Major mite fecal allergen Der p III. All the above proteins belong to family S1 in the classification 
of peptidases[2.E1l and originate from eukaryotic species. It should be noted thatbacterial proteases that belong to 

so family S2A are similar enough in the regions of the active site rosiduos that thoy can be picked up by tho Game patterns. 
These proteases are listed below. - Achromobacter lyticus protease I. - Lysobacter alpha-lytic protease. - Streptogrisin 
A and B (Streptomyces proteases A and B). - Streptomyces griseus glutamyl endopeptidase II. - Streptomyces fradiae 
proteases 1 and 2. 

Consensus pattern: [LIVM]-[ST]-A-[STAG]-H-C [H is the active site residue] 
55 Consensus pattern: [DNSTAGC]-[GSTAPIMVQH]-x(2)-G-[DE]-S-G-[GS]-[SAPHV]-[LIVMFYWH]-[LIVMFYSTANOH] 
(S Is the active site residue] . 1 
[ 1] Brenner S. Nature 334:528-530(1 988). [ 2] Rawlings N.D., Barrett A.J. Meth. Enzymol, 244:19-6T(1994).[E1] 
[1566] 676. (tsp) Thrombospondin type 1 domain 
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[1567] [1] Bork P; FEBS lett 1993;327:125-130. 

[1568] 677. Tubulin subunits alpha, beta, and gamma signature ( 
Tubulins [|1 2] the major constituent of microtubules are dimeric proteins which consist oUwo closely related subunits 
(alpha and bota). Tubulin binds two molecules of GTP at two diflorent sites (N and E). At the E (Exchangeable) site, 

5 GTP is hydrolyzod during incorporation into the microtubule. Near the E site is an invariant region rich in glycines which 
is Jound in both chains andwhich is now [3] said to control the access of the nucleotide to its binding site. A signature 
pattern was developed from this region. With the exception of the .simple eukaryotes, most species express a variety 
of closely related alpha and beta isotypes. In most species there is a third member of the tubulin family: gamma tubulin. 
Gamma tubulin is found at microtubule organizing centers (MTOC) such as the spindle poles or the centrosome, sug- 

10 gcsting that it is involved in the minus-end nucleation of microtubule assembly [4]. 
Consensus pattern: [SAG]-G-G-T-G-[SA]-G 

[ 1] Cleveland D.W., Sullivan K.F. Annu. Rev. Biochem. 54:331 -365(1 985).[ 2] Joshi H.C.. Cleveland D.W. Cell Motil. 
Cyloskeleton 16:159-163(1990).[ 3] Hesse J. ( Thierauf M., Ponstingl H. J. Biol. Chem. 262:15472-15475(1 987).[ 4] 
Joshi H.C. BioEssays 15:637-643(1993). 

15 [1569] Tubulin-beta mRNA autoregulation signal 

Tho stability of bola-tubulin mRNAs are autoregulated by their own translation product [1]. Unpolymerized tubulin sub- 
units bind directly (or activate a tactor(s) which binds co-translationally) to the nascent N-terminus of beta-tubulin. This 
binding is transduced through the adjacent ribosomes to activatean RNAse that degrades the polysome-bound mRNA. 
The recognition element has been shown to be the first four amino acids of beta-tubulin: Met-Arg-Glu-lle. Mutations 

20 to this sequence abolish the autoregulation effect (except for the replacement of Glu by Asp); transposition of this 
sequence to an internal region of a polypeptide also suppresses the autoregulatory effect. 
Gormonmm pnltorn; <M-R-|DE]-|IL] 

[ 1] Cleveland D.W. Trends Biochem. Sci. 13:339-343(1988). 

[1570] 678! (tRNA-synt 2c) Aminoacyl-transfer RNA synthetases class-ll signatures. Aminoacyl-tRNA synthetases 

25 (EC 6 1 1 -) [1] are a group of enzymes which activate amino acids and transfer them to specific tRNA molecules as 
the first step in protein biosynthesis. In prokaryotic organisms there are at least twenty different types of aminoacyl- 
tRNA synthetases, one for each different amino acid. In eukaryotes there are generally two aminoacyl-tRNA synthetas- 
es for each diflorent amino acid: one cytosolic form and a mitochondrial form. While all these enzymes have a common 
function, they are widely diverse in terms of subunit size and of quaternary structure. The synthetases specific for 

30 alanine. asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, and threonine are referred 
to as class-ll synthetases [2 to 6] and probably have a common folding pattern in their catalytic domain for the binding 
of ATP and amino acid which is different to the Rossmann fold observed for the class I synthetases [7].CIass-ll tRNA 
synthetases do not share a high degree of similarity, however at least three conserved regions are present [2,5,8]. 
Signature patterns have been derived from two of these regions. 

35 Consensus pattern: [FYH]-R-x-[DE]-x(4,12)-[RH]-x(3)-F-x(3)-[DE]- 

Consensus pattern: [GSTALVF]-{DENQHRKP}-[GSTAHLIVMF]-[DE]-R-[LIVMF]-x-[UVMSTAG]-[LIVMFY]- 
[1571] [ 1] Schimmel P. Annu. Rev. Biochem. 56:125-158(1987).[ 2] Delarue M., Moras D. BioEssays 15:675-687 
(1993) [ 3) Schimmel P Trends Biochem. Sci. 16:1-3(1991).[ 4] Nagel G.M., Doolittle R.F. Proc. Natl. Acad. Sci. U.S. 
A 88 8121-8125(1991). [ 5] Cusack S., Haertlein M., Leberman R. Nucleic Acids Res. 19:3489-3498(1991).! 6]Cusack 

40 s' Biochimie 75: 1077-1 081 (1993).[ 7] Cusack S., Berthet-Colominas C, Haertlein M., NassarN., Leberman R. Nature 
347:249-255(1 990). | 8] Leveque F., Plateau P., Dessen P., Blanquet S. Nucleic Acids Res. 18:305-312(1990). 
[1572] 679. UBA-domain 

[1573] The UBA-domain (ubiquitin associated domain) is a novel sequence motif found in several proteins having 
connections to ubiquitin and the ubiquitination pathway. The structure of the UBA domain consists of a compact three 
45 helix bundle [1]. Number of members: 84 

[1 574] [1 ] Structure of a human DNA repair protein UBA domain that interacts with HI V-1 Vpr. Dteckmann T, Withers- 
Ward ES, Jarosinski MA, Liu CF, Chen IS, Feigon J; Nat Struct Biol 1998;5:1042-1047. 
[1575] 680. UBX domain 

Domain present in ubiquilin-regulatory proteins. Present in FAF1 and Shplp.Number of members: 19 
so [1 ] The UBA domain: a sequence motif present in multiple enzyme classes of the ubiquitination pathway Hofmann K, 
Bucher P; Trends Biochem Sci 1996;21:172-173. 

[1576] 681 . (UCH) Ubiquitin carboxyl-terminal hydrolases family 1 cysteine active site 

Ubiquitin carboxyl-terminal hydrolases (UCH) (deubiquitinating enzymes) [1,2] are thiol proteases that recognize and 
hydrolyze the peptide bond at the C-terminal glycine of ubiquitin. These enzymes are involved in the processing of 
55 poly-ubiquitin precursors as well as that of ubiquinated proteins. There are two distinct families of UCH. The first class 
consist of enzymes ofabout 25 Kd and is currently represented by: - Mammalian isozymes L1 and L3. - Yeast YUH1. 
- Drosophila Uch One of the active site residues of class-l UCH [3] is a cysteine. A signature pattern has be n derived 
from the region around that residue. Consensus pattern: Q-x(3)-N-[SA]-C-G-x(3)-[LIVM](2)-H-[SA]-[LIVM]-[SA] [C is 
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(he active site residue 

[ 1] Jentsch S., Seufert W. p Hauser H.-R Biochim. Biophys. Acta 1089:127-139(1991). [ 2] D'andrea A., Pellman D. Crit. 
Rev. Biochem. Mol Biol. 33:337-352(1 998).[ 3] Johnston S.C., Larsen C.N., Cook W.J. , Wilkinson K.D., Hill CP. EMBO 
J. 16:3787 -3796(1997). [ 4] Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:461-486(1994). 

5 P 5 77] 682. Ubiquitin carboxyl-terminal hydrolases family 2 signatures (UCH-1) 

Ubiquitin carboxyl-terminal hydrolases (UCH) (deubiquitinating enzymes) [1,2] are thiol proteases that recognize and 
hydrolyze the peptide bond at the C-terminal glycine of ubiquitin. These enzymes are involved in the processing of 
poly-ubiquitin precursors as well as that of ubiquinated proteins. There are two distinct families of UCH. The second 
class consist of largeproteins (800 to 2000 residues) and is currently represented by: - Yeast UBP1 , UBP2, UBP3, 

10 UBP4 (or DOA4/SSV7), UBP5, UBP7, UBP9, UBP10, UBP11 , UBP12, UBP13, UBP14, UBP15and UBP16. - Human 
tre-2. - Human isopeptidase T. - Human isopeptidase T-3. - Mammalian Ode-1! - Mammalian Unp. - Mouse Dub-1. - 
Drosophila fat facets protein (gene fat). - Mammalian fat homolog. - Drosophila D-Ubp-64E. - Caenorhabditis eiegans 
hypothetical protein R10E11.3. - Caenorhabditis eiegans hypothetical protein K02C4.3.These proteins only share two 
regions of similarity. The first region containsa conserved cysteine which is probably implicated in the catalytic mech- 

*5 anism. The second region contains two conserved histidines residues, one of which is also probably implicated in the 
catalytic mechanism. Signature patterns for both conserved regions have been developed. 

Consensus pattern: G-fLIVMFY]-x(1 ,3)-[AGC]-[NASM]-x-C-[FYW]-[UVMC]-[NST]-[SACV]-x-[LI VMS]-Q [C is the puta- 
tive active site residue] 

Consensus pattern: Y-x-L-x-[SAG]-[LIVMFT]-x(2)-H-x-G-x(4,5)-G-H-Y [The two H*s are putative active site residues] 
20 [ 1] Jentsch S., Seufert W., Hauser H.-P Biochim. Biophys. Acta 1 089: 127- 139(1 991 ).[ 2] D'andrea A., Pellman D. Crit. 
Rev. Biochem. Mol. Biol. 33:337-352(1 998). [ 3] Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:461-486(1994). 
[1S78] 683. Ubiquitin carboxyl-terminal hydrolases family 2 signatures (UCH-2) 

Ubiquitin carboxyl-terminal hydrolases (UCH) (deubiquitinating enzymes) [1,2] are thiol proteases that recognize and 
hydrolyze the peptide bond at the C-terminal glycine of ubiquitin. These enzymes are involved in the processing of 

2S poly-ubiquitin precursors as well as that of ubiquinated proteins. There are two distinct families of UCH. The second 
class consist of largeproteins (800 to 2000 residues) and is currently represented by: - Yeast UBP1 , UBP2, UBP3, 
UBP4 (or DOA4/SSV7), UBP5, UBP7, UBP9, UBP10, UBP11, UBP12, UBP1 3, UBP14, UBP15 and UBP16. - Human 
tre-2. - Human isopeptidase T. - Human isopeptidase T-3. - Mammalian Ode-1. - Mammalian Unp. - Mouse Dub-1. - 
Drosophila fat facets protein (gene faf). - Mammalian fat homolog. - Drosophila D-Ubp-64E. - Caenorhabditis eiegans 

30 hypothetical protein R10E11.3. - Caenorhabditis eiegans hypothetical protein K02C4.3.These proteins only share two 
regions of similarity. The first region containsa conserved cysteine which is probably implicated in the catalytic mech- 
anism. The second region contains two conserved histidines residues, one of which is also probably implicated in the 
catalytic mechanism. Signature patterns for both conserved regions have been dovoloped. 

Consensus pattern: G-[LIVMFY]-x(1 ,3)-[AGC]-[NASM]-x-C-[FYW]-[LIVMC]-[NST]-[SACV]-x-[LI VMS]-Q [C is the puta- 
35 tive active site residue] 

Consensus pattern: Y-x-L-x-[SAG]-[LIVMFT]-x(2)-H-x-G-x(4,5)-G-H-Y [The two H's are putative active site residues] 
[ 1] Jentsch S., Seufert W., Hauser H.-P Biochim. Biophys. Acta 1089:127-139(1991). [ 2] D'andrea A., Pellman D. Crit. 
Rev. Biochem. Mol. Biol. 33:337-352(1 998). [ 3] Rawlings N.D., Barrott A.J. Moth. Enzymol. 244:461-486(1994). 
[1579] 684. UDP-glycosyltransf erases signature 

40 UDP glycosyltransf erases (UGT) are a superfamily of enzymes that catalyzes the addition of the glycosyl group from 
a UTP-sugar to a small hydrophobic molecule. This family currently consist ol: - Mammalian UDP-glucoronosyl trans- 
ferases (UDPGT) [1,2]. A large family of membrane-bound microsomal enzymes which catalyze the transfer of glu- 
curonic acid to a wide variety of exogenous and endogenous lipophilic substrates. These enzymes are of major im- 
portance in the detoxification and subsequent elimination of xenobiotics such as drugs and carcinogens. - A large 

45 number of putative UDPGT from Caenorhabditis eiegans. - Mammalian 2-hydroxyacylsphingosine 1 -beta-galactosyl- 
transferase [3] (also known as UDP-galactose-ceramide galactosyltransferase). This enzyme catalyzes the transfer 
of galactose to ceramide, a key enzymatic step in the biosynthesis of galactocorebrosides, which aro abundant sphin- 
golipids of the myelin membrane of the central nervous system and peripheral nervous system. - Plants flavonol 0(3)- 
glucosyltransferase. An enzyme [4] that catalyzes the transfer of glucose from UDP-glucose to a flavanol. This reaction 

50 is essential and one of the last steps in anthocyanin pigmont biosynthesis. - Baculovirusos ocdystoroid UDP-glucosyl- 
transf erase (EC 2.4.1.-) [5] (egt). This enzyme catalyzes the transfer of glucose from UDP-glucose to ectysteroids 
which are insect molting hormones. The expression of egt in the insect host interferes with the normal insect develop- 
ment by blocking the molting process. - Prokaryotic zeaxanthin glucosyl transferase (gene crtX), an enzyme involved 
in carotenoid biosynthesis and that catalyses the glycosylation reaction which converts zeaxanthin to zeaxanthin-beta- 

5S diglucoside.-Streptomyces macrolid glycosyltransferases [6]. These enzymes specifically inactivates macrolido ani- 
tibiotlc6 via Z-O-glycosylatlon using UDP-glucouo.Thouo on/ymoa uIuho h conauivud donwiln ol about 60 nmino add 
residues locatedin their C-terminal section and from which a pattern has been extracted todetect them. 
Consensus pattern: [FWJ-x(2)-Q-x(2)-[LIVMYA]-[LIMV]-x(4,6HLVGAC]- [LVFYA]-[LIVMF]-|STAGCM]-[HNQ]- 
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[STAGC]-G-x(2)-[STAG]-x(3)-[STAGLJMLIVMFA]-x(4)-[PQR]-|LIVMT]-x(3)-[PA]-x(3)-[DES]-[QEHN] 
[ 1] button G.J. (In) Giucorontdation of drugs and other compounds, Dutton G.J., Ed., pp 1-78, CRC Press, Boca Raton, 
(1980) I 3] Burchell B., Nebert D.W., Nelson D.R., Bock K.W., lyanagi T., Jansen PL, Lancet D., Mulder G.J. , Chow- 
dhury J.R. . Siest G. ( Tephly T.R., Mackenzie P.I. DNA Cell Biol. 10:487-494(1991 ).[ 3] Schulte S., Stoffel W. Proc. Natl, 
c AoitJ. Sci. U.S.A. 90:10265-10269(1993).! A] FurlokD., Schiololboin J. W., Johnston R, Nelson O.E. Jr. Plant Mol. Biol. 
11:473-481(1988).[ 5] O'Reilly D.R., Miller L.K. 

Science 245:1110-1112(1989).! 6] Hernandez C, OlanoC, Mendez C Salas J.A. Gene 134:139-140(1993). 
[1580] 685. UDP-glucose/GDP-mannose dehydrogenase family 

[1581] The UDP-glucose/GDP-mannose dehydrogenaseses are a small group of enzymes which possesses the 
w ability to catlyzo the NAD-dopondont 2-fold oxidation of an alcholol to an acid without the release ol an aldehyde 
Intoimodiato [2J. Number ol mombors: 55 

[1582] [1] Purification and characterization of guanosine diphospho-D-mannose dehydrogenase, A key enzyme in 
the biosynthesis of alginate by Pseudomonas aeruginosa. Roychoudhury S, May TB, Gill JF, Singh SK, Feingold DS, 
Chakrabarty AM; J Biol Chem 1 989;264:9380-9385. [2] Properties and kinetic analysis of UDP-glucose dehydrogenase 
is from group A streptococci. Irreversible inhibition by UDP-chloroacetol. Campbell RE, Sala RF, van de Rijn I, Tanner 
ME; J Biol Chem 1997;272:3416-3422. 
[1583] 686. Uracil-DNA glycosylaso signature 

Uracil-DNA glycosylase (EC 3,2,2.-) (UNG) [1] is a DNA repair enzyme that excises uracil residues from DNA by 
cleaving tho N-glycosylic bond. Uracil in DNA can arise as a result of misincorportation ol dUMP residues by DNA 

20 polymerase or examination of cytosine. The sequence of uracil-DNA glycosylase is extremely well conserved [2] in 
bacteria and eukaryotes as well as in herpes viruses. More distantly related uracil-DNA glycosylases are also found 
in poxviruses [3].ln eukaryotic cells, UNG activity is found in both the nucleus and the mitochondria. Human UNG1 
protein is transported to both the mitochondria and the nucleus [4]. The N-terminal 77 amino acids of UNG1 seem to 
be required for mitochondrial localization [4], but the presence of a mitochondrial transitpeptide has not been directly 

25 demonstrated. As a signature for this type of enzyme, the most N-termina conserved region has been selected. This 
region contains an aspartic acid residue which has been proposed, based on X-ray structures [5,6] to act as a general 
base in the catalytic mechanism. 

Consensus pattern: [KR]-|LIV]-[LIVCHLI VM]-x-G-[QI]-D-P-Y [D is the active site residue]- 

[ 1] Sancar A„ Sancar G.B. Annu. Rev. Biochorrv 57:29-67(1 98B).[ 2] Otsen L.C., Aasland R., Wittwer C.U., Krokan 
30 H.E., Helland D.E. EMBO J. 8:3121-3125 (1989).[ 3] Upton C, Stuart D.T., McFadden G. Proc. Natl. Acad. Sci. U.S. 
A 90:4518-4522(1 993). [ 4] Slupphaug G., Markussen R-H., Olsen LC, Aasland R., Aarsaether N. ( Bakke O., Krokan 
H.E., Helland D.E. Nucleic Acids Res. 21:2579-2584(1993).[ 5] Sawa R., McAuley-Hecht K., Brown T.. Pearl L. Nature 
373 487-493(1 995).[ 6] Mol CD., Arvai A.S., Slupphaug G. ( Kavli B., Alseth I., Krohan H.E., Tainer J.A. Cell 80:869-878 
(1995) f 7] Muller S.J., Caradonna S. Biochim. Biophys. Acta 1088: 197-207(1 991 ).[ 8] Meyer-Siegler K., Mauro D.J., 
35 Seal G., Wurzer J., Deriel J.K., Sirover M.A. Proc. Natl. Acad. Sci. U.S.A,88:8460-8464(1991).[ 9] Muller S.J., Cara- 
donna S. J. Biol. Chem. 268:1 31 0-1 31 9(1 993).[10] Barnes D.E., LindahIT, Sedgwick B. Curr. Opin. Cell Biol. 5:424-433 
(1993). 

[1584] 687. Uncharacterized protein family UPF0001 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: 

40 

- Yeast chromosome II hypothetical protein YBL036c. - Caenorhabdilis elegans hypothetical protein F09E5.8. - 
Bacillus subtilis hypothetical protein ylmE. - Escherichia coli hypothetical protein yggS and HI0090, the correspond- 
ing Haemophilus influenzae protein. - Helicobacter pylori hypothetical protein HP0395. - Mycobacterium tubercu- 
losis hypothetical protein MtCY270.20. - Synechocystis strain PCC 6803 hypothetical protein slr0556. - A Pseu- 
4S domonas aeruginosa hypothetical protein in pilT 5'region. - A Vibrio alginolyticus hypothetical protein in pilT 5're- 

gion. These are proteins of from 25 to 30 Kd which contain a number of conserved regions. The best conserved 
region which is located in the first third of these proteins has been selected as a signature pattern. 

Consensus pattern: [FW]-H-[FM]-[IV]-G-x-tLIV]-Q-x-[NKR]-K-x(3)-[LIV] 
so [ 1] Bairoch A., Rudd K.E. Unpublished observations (1996). 

[1585] 68B. Uncharacterized protein family UPF0003 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Escherichia coli protein 
aefA. - Escherichia coli hypothetical protein yggB. - Escherichia coli hypothetical protein yjeP and HI0195.1, the cor- 
responding Haemophilus influenzae protein. - Escherichia coli hypothetical protein ynal. - Bacillus subtilis hypothetical 
55 protein yhdY. - Helicobacter pylori hypothetical protein HP0415. - Synechocystis strain PCC 6803 hypothetical protein 
slr0639. - Archaeoglobus fulgidus hypothetical protein AF1546. - Methanococcus jannaschii hypothetical protein 
MJ0170. - Methanococcus jannaschii hypothetical protein MJ1143.The size of these proteins range from 30 to 120 Kd. 
They all contain a number of transmembrane regions. The best conserved region which is located in and just after the 
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last potential transmembrane region has been selected as a signature pattern,. 

Consensus pattern: G-[STIF]-V-x(2HLIVM]-x(6)-[LIVMF]-x(3)-[DQ]-x(3)-[LIV]- x-[LIVJ-P-N-x(2)-[LIVMF]-[LIVFSTA]-x 
(5)-N 

( 1] Bairoch A. Unpublished observations (1997). 

5 [1586] 689. Uncharacterized protein family UPF0004 signatur . ; 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Escherichia coli hypo- 
thetical protein yliG. - Escherichia coli hypothetical protein yleA and HI001 9, the corresponding Haemophilus influenzae 
protein. - Bacillus subtilis hypothetical protein yqeV. - Helicobacter pylori hypothetical protein HP0269. - Helicobacter 
pylori hypothetical protein HP0285. - Mycoplasma iowae hypothetical protein in 16S RNA 5'region. - Mycobacterium 

io leprae hypothetical protein B2235_C2_195. - Pseudomonas aeruginosa hypothetical protein in hemL 3'region. - Syn- 
echocystis strain PCC 6803 hypothetical protein slr0082. - Synechocystis strain PCC 6803 hypothetical protein 
SM0996. - Methanococcus jannaschii hypothetical protein MJ0865. - Methanococcus jannaschii hypothetical protein 
MJ0867. - Caenorhabditis elegans hypothetical protein F25B5.5.The size of these proteins range from 47 to 61 Kd. 
Th y contain six conserved cysteines, three of which are clustered in a region that can be used as asignature pattern. 

is Consensus pattern: [LIVM]-x-[LIVMT]-x(2)-G-C-x(3)-C-[STAN]-|FY]-C-x-ILIVM]-x{4)-G 
|1] Bairoch A. Unpublished observations (1997). 
[1587] 690. Uncharacterized protein family UPF0005 signature 

The following proteins seems to be evolutionary related [1]: - Mammalian protein TEGT (Testis Enhanced Gene Tran- 
script). - Escherichia coli hypothetical protein yccA and HI0044, the corresponding Haemophilus influenzae protein. - 
20 a probable Pseudomonas aeruginosa ortholog of yccA. These are proteins of about 25 Kd which seem to contain 
seven transmembranedomains. A signature pattern that corresponds to a rogion that starts with tho beginning of tho 
third transmembrane domain and ends in the middle of the fourth one has been developed. 

Consensus pattern: G-[LIVM](2)-[SA]-x(5 t 8)-G-x(2)-[LIVM]-G-P-x-L-x(4)-ISAG]-x(4,6)-[LIVM](2)-x(2)-A-x(3)-T-A- 
[LIVM](2)-F 

25 [i] Walter L, Marynen P., Szpirer J., Levan G., Guenther E. Genomics 28:301-304(1995). 
[1588] 691. Uncharacterized protein family UPF0006 signatures 

The following uncharacterized proteins have been shown [1) to share regions of similarities: - Yeast chromosome II 
hypothetical protein YBL055C. - Escherichia coli hypothetical protein ycfH and HI0454, the corresponding Haemophilus 
influenzae protein. - Escherichia coli hypothetical protein yigW - Escherichia coli hypothetical protein yjjVand HI0081 , 
30 the corresponding Haemophilus influenzae protein. - Bacillus subtilis hypothetical protein yabD. - Haemophilus influ- 
enzae hypothetical protein H1 1664. - Mycoplasma genitalium hypothetical protein MG009. These are proteins of from 
24 to 47 Kd which contain a number of conserved regions. They can be picked up in the database by the following 
patterns. 

Consensus pattern: [LIVMFY](2)-D-[STA]-H-x-H-[LIVMF]-[DN 
35 Consensus pattern: P-[LIVM]-x-[LI VM]-H-x-R-x-[TA]-x-[DE 

Consensus pattern: [LVSA]'[LIVA]-x(2)-[LIVM]-[PS]-x(3)-L-[LIVM]-[LIVMS]-E-T- D-x-P 
[ 1] Bairoch A., Rudd K.E. Unpublished observations (1995). 
[1589] 692. Uncharacterized protein family UPF0007 signature 

The following proteins seems to be evolutionary related [1]: - Escherichia coli hypothetical protein ygbP and HI0672, 
40 the corresponding Haemophilus influenzae protein. - Bacillus subtilis hypothetical protein yacM. - Mycobacterium tu- 
berculosis hypothetical protein MtCY06G1 1.29c. - Synechocystis strain PCC 6803 hypothetical protoin slr0951. - A 
Rhodobacter capsulalus hypothetical protein in nifRS 5'region. Except for the Rhodobacter protein which contains a 
C-terminal extension, all these proteins have from 225 to 236 amino acids. They are hydrophilic proteins that can be 
picked up in the database by the following pattern. 
45 Consensus pattern: V-L-[IV)-H-D-[GA]-A-R 

[ 1] Bairoch A. Unpublished observations (1997). 

[1590] 693. Uncharacterized protein family UPF0015 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Yeast chromosome II 
hypothetical protein YBR002c. - Yeast chromosome XIII hypothetical protein YMR101C - Escherichia coli hypothetical 

50 protein yaeU and HI0920, tho corresponding Hnomophilus influonzno protoin, - Helicobacter pylori hypolhotlcfil prololn 
HP1721. - Mycobacterium leprae hypothetical protein B1937_F2_65. - A Corynebacterium glutamicum hypothetical 
protein in aroF 3'region. - A Streptomyces fradiae hypothetical protein in transposon Tn4556. - Synechocystis strain 
PCC 6803 hypothetical protein SII0505. - Methanococcus jannaschii hypothetical protein MJ1 372.These are proteins 
of about 26 to 40 Kd whose central region is well conserved. They can be picked up in the database by the following 

55 pattern. 

Consensus pattern: [DE]-[LIVMF](3)-R-T-lSG]-G-x(2)-R-x-S-x-[FYMUVMJ(2)-W-Q. I 

[ 1] Wolfe K.H., Lohan A.J.E. Yeast 10:S41-S46(1994), 

[1591] 694. Uncharacterized protein family UPF001 6 signature 
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The following uncharacterized proteins have been shown 11) to share regions of similarities; - Y ast hypothetical protein 
YBR187W. - Fission yeast hypothetical protein SpACl7G8.08c. - Mouse protein pFT27. - Synechocystis strain PCC 
6803 hypothetical protein sll0615. These are hydrophobic proteins of 20.0 to 320 amino.acids that seem to contain six 
or oovon transmembrane- domains. A conserved region which seems, in the eukaryotic proteins of this family, to directly 
& follow tho second transmombrano domain has boon sotoctod as a signature paltorn. 
Consensus pattern: E-[LIVM)-G-D-K-T-F-[LIVMF](2)-A- 
[ 1] Bairoch A. Unpublished observations (1996). 
[1592] 695. Uncharacterized protein family UPF0021 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Yeast chromosome VII 
to hypothetical protein YGL21 1 w. - Dictyostelium discoideum protein veg! 36. - Methanococcus jannaschii hypothetical 
proloins MJ1157 and MJ 1478. These are proteins of from 300 to 36o residues. They can be picked up in thedatabase 
by the following pattern which is located in their N-terminalsection. Consensus pattern: C-K-x(2)-F-x(4)-E-x(22,23)-S- 
G-G-K-D 

[ 1] Bairoch A. Unpublished observations (1997). 

is [1593] 696. Uncharacterized protein family UPF0023 signature 

Tho following uncharactorizod proteins havo boon shown |1] to share regions of similarities: - Mouse protein 22A3. - 
Yeast chromosome XII hypothetical protein YLR022c. - Caenorhabditis elegans hypothetical protein W06E11 .4. - Meth- 
anococcus jannaschii hypothetical protein MJ0592.These are hydrophilic proteins of about 30 Kd. They can be picked 
up in the database by the following pattern. 

20 [1594] Consensus pattern: D-x-D-E-[LIV>L-x(4)-V-F-x(3)-S-K-G- 
[1595] [1] Bairoch A. Unpublished observations (1 997). 

[1596] 697. Uncharaclorizod protoin family UPF0024 signature. The following uncharacterized proteins have been 
shown [1 J to share regions of similarities: - Escherichia coli hypothetical protoin ygbO and HI0701. the corresponding 
Haemophilus influenzae protein. - Helicobacter pylori hypothetical protein HP0926. - Yeast chromosome XV hypothet- 
25 ical protein YOR243c. - Caenorhabditis elegans hypothetical protein B0024.11. - Methanococcus jannaschii hypothet- 
ical proteins MJ058S and MJ1364.These are hydrophilic proteins of from 39 to 77 Kd. They can be picked up in the 
dalabaso by the following pattern. 

[1 597] Consonsus pattern: G-x-K-D-[KR]-x-A-[LV]-T-x-Q-x-[LIVF]-[SGC]- 
[ 1] Bairoch A. Unpublished observations (1997). 

30 [1598] 698. Uncharacterized protein family UPF0025 signature 

The following uncharacterized proteins have been shown |1] to share regions of similarities: - Escherichia coli hypo- 
thetical protein yfcE - Bacillus subtilis hypothetical protein ysnB. - Mycoplasma genitalium and pneumoniae hypothet- 
ical protein MG207. - Methanococcus jannaschii hypothetical proteins MJ0623 and MJ0936. These are hydrophilic 
proteins of about 20 Kd. They can be picked up in thedatabase by the following pattern. 

35 Consensus pattern: D-V-[LIV]-x(2)-G-H-[ST]-H-x(12)-[LIVMF]-N-P-G 
[ 1] Bairoch A. Unpublished observations (1997). 
[1599] 699. Uncharacterized protein family UPF0029 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Yeast chromosome III 
hypothetical protein YCR59c. - Yeast chromosome IV hypothetical protein YDL177C. - Escherichia coli hypothetical 
40 protein yigZ and HI0722, the corresponding Haemophilus influenzae protein. - Bacillus subtilis hypothetical protein 
yvyE. - A Thermus aquaticus hypothetical protein in pol 5'region. These proteins can be picked up in the database by 

the following pattern. 

Consensus pattern: G-x(2)-[LlVM](2).x(2).[LIVM]-x(4)-[LIVM]-x(5)-[LIVM](2)-x- R-[FYW](2)-G-Goc(2)-[LIVM]-G 

[ 1] Koonin E.V., Bork P., Sander C. EMBO J. 13:493-503(1994). 

45 [1600] 700. Uncharacterized protein family UPF0030 signature 

The following uncharacterized proteins have been shown (1] to be highly similar: - Yeast chromosome VI hypothetical 
protein YFL060c. - Yeast chromosome XIII hypothetical protein YMR095c. - Yeast chromosome XI V hypothetical protein 
YNL334C. - Bacillus subtilis hypothetical protein yaaE. - Haemophilus influenzae hypothetical protein HI1648. - Meth- 
anococcus jannaschii hypothetical protein MJ1661 .These are hydrophilic proteins of about 19 to 25 Kd. They can be 

50 picked up inthe database by the following pattern. 

Consensus pattern: [GA]-L-I-[L!V]-P-G-G-E-S-T-[STA] 

[ 1] Bairoch A. Unpublished observations (1997). 

[1601] . 701 . Uncharacterized protein family UPF0032 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Escherichia coli hypo- 
55 thetical protein yigU and HI0188, the corresponding Haemophilus influenzae protein. - Bacillus subtilis hypothetical 
protein ycbT. - Mycobacterium tuberculosis hypothetical protein MtCY49.33c and U2126A, the corresponding Myco- 
bacterium leprae protein. - Synechocystis strain PCC 6803 hypothetical protein sII0194. - Odontella sinensis and Por- 
phyra purpurea chlroplast hypothetical protein ycf43. These proteins have from 245 to 317 amino acids and seem to 
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contain at least six or seven transmembrane regions. A conserved region located in the central section of these proteins 
has been developed as a signatur pattern,. 

Consensus pattern: Y-x(2)-F-[LIVMA](2)-x-L-x(4)-G-x(2)-F-[EQ]-[LIVMF]-P- [LIVM] - [ 1] Bairoch A., Rudd K.E. Unpub- 
lished observations (1996). 

s [1602] 702. Uncharacterized protein family UPF0034 signatur 

Th following uncharacterized proteins have been shown [1] to share regions of similarities: - Escherichia coli hypo- 
thetical protein yhdG and HI0979, the corresponding Haemophilus influenzae protein. - Escherichia coli hypothetical 
protein yjbN and HI0634, the corresponding Haemophilus influenzae protein. - Escherichia coli hypothetical protein 
yohl and HI0270, the corresponding Haemophilus influenzae protein. - Bacillus subtilis hypothetical protein yacF - 

io Rhodobacter capsulatus protein ni!R3 and related proteins in Azospirillum brasilense and Rhizobium leguminosarum. 
- Synechocystis strain PCC 6803 hypothetical protein slr0644. - Synechocystis strain PCC 6803 hypothetical protein 
SII0926. - Caenorhabditis elegans hypothetical protein C45G9.2. - Yeast protein SMM1. - Yeast hypothetical protein 
YLR401C. - Yeast hypothetical protein YLR405w. - Yeast hypothetical protein YML080w. Although it has been proposed 
(2] that Rhodobacter capsulatus nifR3 is a transcriptional regulatory protein, it is believed that these proteins constitute 

*5 a family of enzymes whose active site could include a conserved cysteine which has been used as the central part of 
a signature pattern. 

Consensus pattern: [LIVM]-[DNG]-[LIVM]-N-x-G-C-P-x(3)-|LIVMASQ]-x(5)-G-[SAC] 

[ 1] Bairoch A., Rudd K.E. Unpublished observations (1995). [2] Foster-Hartnett D., Cullen P.J., Gabbert K.K., Kranz 
R.G. Mol. Microbiol. 8:903-91 4(1 993). 

20 [1603] 703. Uncharacterized protein family UPF0038 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Escherichia coli hypo- 
thetical protein yacE and HI0890, the corresponding Haemophilus influenzae protein. - Mycobacterium tuberculosis 
hypothetical protein MtCYOI B2.23 and O410, the corresponding Mycobacterium leprae protein. - Synechocystis strain 
PCC 6803 hypothetical protein slr0553. - Other hypothetical proteins from Aeromonas hydrophila, Bacteroides nodo- 

2S sus, Neisseria gonorrhoeae, Pseudomonas putida, Thermus thermophilus and Xanthomonas campestris. - Human 
hypothetical protein pOV-2. - Yeast hypothetical protein YDR196C. - Caenorhabditis elegans hypothetical protein 
T05G5.5.These proteins all contain, in their N-lerminal extremity, an ATP/GTP-binding motil 'A' (P-loop) (soo 
<PDOC00017>). The size of these proteins range from 200 to 290 residues (with the exception of the Mycobacterial 
sequences which are are 410 residues long). A conseved region some 50 residues away from the ATP-binding P-loop 

30 has been developed as a signature pattern. 

Consensus pattern: G-x-[Ll]-x-R-x(2)-L-x(4)-F-x(8HLIV]-x(5)-P-x-[LIV]-[ 1] Rudd K.E., Bairoch A. Unpublished obser- 
vations (1997). 

[1604] 704. Ubiquitin-conjugating enzymes active site 

Ubiquitin-conjugating enzymes (UBC or E2 enzymes) [1,2,3] catalyze the covalent attachment of ubiquitin to target 
35 proteins. An activatedubiquitin moiety is transferred from an ubiquitin-activating enzyme (E1 ) to E2which later ligates 
ubiquitin directly to substrate proteins with or without the assistance of 'N-end' recognizing proteins (E3). In most 
species there are many forms of UBC (at least 9 in yeast) which are implicated in diverse cellular functions. A cysteine 
residue is required lor ubiquitin-thiolostor lormation. Thoro is a single consorvod cystoino In UBC's find tho ronton 
around that residue isconserved in the sequence of known UBC isozymes. That region has been used as a signature 
pattern. 

Consensus pattern: [FYWLSPl-H-lPCl-fNHj-IUVl-xfS^J-G-x-ILIVl-C-ILIVl-x- [LIV] [C is the active site residue] 
[ 1] Jentsch S., Seufert W., Sommer T., Reins H.-A. Trends Biochem. ScL 15:195-198(1990). [ 2] Jentsch S., Seufert 
W., HauserH.-P. Biochim. Biophys. Acta 1089: 127-1 39(1 991 ).[ 3] Hershko A. Trends Biochem. Sci. 16:265-268(1991). 
[1605] 705. Uroporphyrinogen decarboxylase signatures 

45 Uroporphyrinogen decarboxylase (URO-D), the fifth enzyme of the heme biosynthetic pathway, catalyzes the sequential 
decarboxylation of the four acetyl side chains of uroporphyrinogen to yield coproporphyrinogen [1].URO-D deficiency 
is responsible for the Human genetic diseases familialporphyria cutanea tarda (f PCT) and hepatoerythropoietic por- 
phyria (HEP).The sequence of URO-D has been well conserved throughout evolution. The best conserved region is 
located in the N-terminal section; it contains a perfectlyconserved hexapepttde. There are two arginine residues in this 

so hexapeptide which could be involved in the binding, via salt bridges, to tho carboxylgroups of tho propionate eido chains 
of the substrate. This region has been used as a signature pattern. A second signature pattern is based on a another 
well conserved region which is located in the central section of the protein. 
Consensus pattern: P-x-W-x-M-R-Q-A-G-R 

Consensus pattern: G-F-[STAGCV]-[STAGC]-x-P-[FYW]-T-[LV]-x(2)-Y-x(2)-[AE]-[GK] 
55 [ 1] Garey J.R., Labbe-Bois R., Chelstowska A., Rytka J., Harrison L, Kushner J., Labbe P. Eur. J. Biochem. 205: 
1011-1016(1992). I 
[1606] 706. ubiE/COQ5 m thyltransf erase family signatures 

The following methyftransferases have been shown [1] to share regions of similarities: - Escherichia coll ublE, which 
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is involved in both ubiquinone and menaquinone biosynthesis and which catalyzes the S-adenosylmethionine depend- 
ent methylation of 2-polyprenyl-6-methoxy-1,4-benzoquinol into 2-polyprenyl-3- methyl-6-methoxy-1,4-benzoquinol 
and ol demethylmenaquinol into menaquinol. - Yeast COQ5, a ubiquinone biosynthesis met h lyt ran sf erase. - Bacillus 
cubtilis cporo germination protein C2 (gone: gorcB or gorC2), a probable menaquinone biosynthesis methlytransferase. 

s - Laclococcus Ificlis gorC2 homolog. - Caonorhabditis elogans hypothetical protein ZK652.9. - Leishmania donovani 
amastigote-speciiic protein A41. These are hydrophilic proteins ot about 30 Kd (except for 2K652.9 which is 65Kd). 
They can bo picked up in the database by the following patterns. 
Consensus pattern: Y-D-x-M-N-x(2)-(LIVM]-S-x(3)-H-x(2)-W 
Consensus pattern: R.V-[LIVM]-K-[PV]-G-G-x-[LIVMF]-x(2)-[LIVMJ-E-x-S 

w | 1] Loo P.T, Hsu A.Y., Ha H.T., Clarke C.F. J. Bacteriol. 179:1748-1754(1997). 
[1607] 707. Uricase signature 

Uricase (urate oxidase) [1] is the peroxisomal enzyme responsible for the degradation o1 urate into allantoin. Some 
spocios like primates and birds, have lost the gene for uricase and are therefore unable to degradeurate. Uricase is 
a protein of 300 to 400 amino acids. A highly conserved region located in the central part of the sequence has been 
is used as a signature pattern. 

Conoonouo pullorn: [LV]-x-[LV]-lLI VJ-K-|STV].|ST]-x-ISN].x-F-x(2)-[FY]-x(4)- |FY]-x(2)-L-x(5)-R 
[ 1] Motojima K., Kanaya S, t Goto S. J. Biol. Chem. 263:16677-16681(1988). 
[1608] 708. Universal stress protein family (Usp) 

[1 609] By a wide range of stress conditions members ol the Usp family are predicted to be related to the MADS-box 
20 proteins transcript fact and bind to DN A [2]. Number ot members: 39 

[1] Exptonnion and rolo of tho univorsal etross protein, UspA, of Escherichia co!i during growth arrest. Nystrom T, 
Neidhardt FC; Mol Microbiol 1994; 11:537-544. 

[2] Sequence analysis of eukaryotic developmental proteins: ancient and novel domains. Mushegian AR, Koontn 
25 EV; Genetics 1996; 144:817-828. 

[1610] 709. Ubiquitin domain signature and profile 

Ubiquitin [1,2,3] is a protein of sovenly six amino acid residues, found in all eukaryotic cells and whose sequence is 
extremely we'll conserved from protozoan to vertebrates. It plays a key role in a variety of cellular processes, such as 

30 ATP-dependent selective degradation of cellular proteins.maintenance of chromatin structure, regulation of gene ex- 
pression, stress response and ribosome biogenesis. In most species, there are many genes coding for ubiquitin. How- 
ever they can be classified into two classes. The first class produces polyubiquitin molecules consisting of exact head 
to tail repeats of ubiquitin. The number of repeats is variable (up to twelve in a Xenopus gene). In the majority of 
polyubiquitin precursors, there is a final amino-acid after the last repeat. The second class of genes produces precursor 

35 proteins consisting of a single copy of ubiquitin fused to a C-terminal extension protein (CEP). There are two types of 
CEP proteins and both seem to be ribosomal proteins. Ubiquitin is a globular protein, the last four C-terminal residues 
(Leu-Arg- Gly-Gly) extending from the compact structure to form a tail', important tor its function. The latter is mediated 
by the covalent conjugation of ubiquitin to target proteins, by an isopeptide linkage between the C-terminal glycine and 
the epsilon amino group of lysine residues in the target proteins. There are a number ot proteins which are evolutionary 

40 related to ubiquitin: - Ubiquitin-like proteins from baculoviruses as well as in some strains of bovine viral diarrhea viruses 
(BVDV) These proteins are highly similar lo their eukaryotic counterparts. - Mammalian protein GDX [4]. GDX is com- 
posed of two domains, a N-terminal ubiquitin-like domain of 74 residues and a C-terminal domain of 83 residues with 
some similarity with the thyroglobulin hormonogenic site. - Mammalian protein FAU [5]. FAU is a fusion protein which 
consist of a N-terminal ubiquitin-like protein of 74 residues fused to ribosomal protein S30. - Mouse protein NEDD-8 

45 [6] a ubiquitin-like protein of 81 residues. - Human protein BAT3, a large fusion protein of 1132 residues that contains 
a N-terminal ubiquitin-like domain. - Caenorhabditis elegans protein ubl-1 [7). Ubl-1 is a fusion protein which consist 
of a N-terminal ubiquitin-like protein of 70 residues fused to ribosomal protein S27A. - Yeast DNA repair protein RAD23 
[8] RAD23 contains a N-terminal domain that seems to be distantly, yet significantly, related to ubiquitin. - Mammalian 
RAD23-related proteins RAD23A and RAD23B. - Mammalian BCL-2 binding athanogene-1 (BAG-i). BAG-1 is a protein 

so of 274 residues that contains a central ubiquitin-like domain. - Human spliceosome associated protein 114 (SAP 114 
or SF3A120) - Yeast protein DSK2, a protein involved in spindle pole body duplication and which contains a N-terminal 
ubiquitin-like domain. - Human protein CKAP1/TFCB, Schizosaccharomyces pombe protein alpH and Caenorhabditis 
elegans hypothetical protein F53F4.3. These proteins contain a N-terminal ubiquitin domain and a C-terminal CAP- 
Gly domain - Schizosaccharomyces pombe hypothetical protein SpAC26A3.16. This protein contains a N-termmal 

55 ubiquitin domain - Yeast protein SMT3. - Human ubiquitin-like proteins SMT3A and SMT3B. - Human ub.quilin-hke 
protein SMT3C (also known as PIC1; Ubl 1 ; Sumo-1; Gmp-1 or Sentrin). This protein is involved in targeting ranGAPI 
to tho nuclear pore complex protein ranBP2. - SMT3-like proteins in plants and Caenorhabditis elegans. To identify 
ubiquitin and related proteins, a pattern has been developed based on conserved positions in the central section of 
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the sequence. A profile was also developed that spans the complete length of the ubiquitin domain. 
Consensus pattern: K-x(2)-[LIVM]-x-[DESAK]-x(3)-[LIV^ 

[ 1) Jentsch S., Seufert W., Hauser H.-P. Biochim. Biophys. Acta 1089: 127-1 39(1 991). ( 2] Monia B.P., Ecker D.J., Crok 
S.T. Bio/Technology 8:209-21 5(1 990).[ 3] Finley D. t Varshavsky A. Trends Biochem. Sci. 10:343-347(1985). [.4] Filippi 
5 M., TribioliC, Toniolo D. Genomics 7:453-457(1 990).[ 5] Olvera J., Woof I. G. J. Biol. Chem. i268:17967-,1 7974(1 993). • 
[ 6] Kumar S., Yoshida Y., Noda M. Biochem. Biophys. Res. Commun. 195:393-399(1 993). [ 7] Jones D., Candido E. - 
P. J. Biol. Chem. 268: 19545-1 9551 (1993).[8] Melnick L, Sherman F. J. Mol. Biol. 233:372-388(1993). 
[1611] 710. VHS domain 

[1612] Domain present in VPS-27, Hrs and STAM. Number of members: 27 

io [1613] 711. Vinculin family signatures 

Vinculin [1] is a eukaryotic protein that seems to be involved in the attachment of the actin-based microfilaments to the 
plasma membrane. Vinculinis located at the cytoplasmic side of focal contacts or adhesion plaques. In addition to actin, 
vinculin interacts with other structural proteins such as talin and alpha-actinins. Vinculin is a large protein of 116 Kd . 
(about a 1000 residues). Structurally the protein consists of an acidic N-terminal domain of about 90 Kd separated 

'5 from a basic C-terminal domain of about 25 Kd by a proline-rich region of about 50 residues. The central part of the 
N-terminal domain consists of avariable number (3 in vertebrates, 2 in Caenorhabditis elegans) of repeats of a 110 
amino acids domain. Catenins [2] are proteins that associate with the cytoplasmic domain of avariety of cadherins. 
The association of catenins to cadherins produces a complex which is linked to the actin filament network, and which 
seems to be of primary importance for cadherins cell-adhesion properties. Three different types of catenins seem to 

20 exist: alpha, beta, and gamma. Alpha-catenins are proteins of about 100 Kd which are evolutionary related to vinculin. 
Interm of their structure the most significant differences are the absence, inalpha-catenin, of the repeated domain and 
of the proline-rich segment. Two signature patterns for this family of proteins have been devolped. The first pattern is 
located in the N-terminal section of both vinculin and alpha-catenins and is part, in vinculin, of a domain that seems to 
be involved with the interaction with talin. The second pattern is based on a conserved regionin the N-terminal part of 

25 the repeated domain of vinculin. 

Consensus pattern: [KR]-x-[LIVMF]-x(3)-[LIVMA]-x(2)-[LIVM]-x(6)-R-Q-Q-E-L Consensus pattern: [LIVM]-x-[QA]-A-x 
(2)-W-[IL]-x-[DN]-P 

[ 1] Otto J. J. CellMotil. Cytoskeleton 1 6: 1-6(1 990). [ 2] Herrenknecht K., Ozawa M., Eckerskorn C. ( Lottspeich F., Lenter 
M., KemlerR. Proc. Natl. Acad. Sci. U.S.A. 88:9156-9160(1991). 
30 [1614] 712. (Vitellogenin N) Lipoprotein amino terminal region 

[1615] This family contains regions from: Vitellogenin, Microsomal triglyceride transfer protein and apolipoprotein B- 
100. These proteins aro all involved in lipid transport [1], This family contains tho LV1n chain from lipovitollin, thnt 
contains two structural domains. Number of members: 33 

[1616] [1] The structural basis of lipid interactions in lipovitellin, a soluble lipoprotein. Anderson TA, Levitt DG, Ba- 
35 naszak LJ Structure 1998;6:895-909. 

[1617] 713. (VMSA) Major surface antigen from hepadnavirus 

[1618] 714. ssDNA binding protein (Viral DNA bp) 

This protein is found in herpesviruses and is needed for replication. 

[1619] 715. (Votage CLC) Voltage gated chloride channels 
40 [1620] This family of ion channels contains 10 or 12 transmembrane helices. Each protein forms a single pore. It 

has been shown that some members of this family form homodimers. These proteins contain two CBS domains. 

[1] Schmidt-Rose T, Jentsch TJ; J Biol Chem 1997;272:20515-20521. 

[2] Zhang J. George AL Jr, Griggs RC, Fouad GT, Roberts J, Kwiocinski H, Connolly AM, Plncok LJ; Neurology 
45 1996;47:993-998. 

[1621] 716. von Willebrand factor type A domain (vwa) 

More von Willebrand factor type A domains? Sequence similarities with malaria thrombospondin-related anonymous 
protein, dihydropyridine-sensitive calcium channel and inter-alpha-trypsin inhibitor. 
50 Bork R Rohde K; 

Biochem J 1 991 ;279:908-91 1 . 

1 . RUGGERI, Z.M. and WARE, J. 
von Willebrand factor. 

55 FASEB J. 7 308-316(1993). 

2. COLOMBATTI, A., BONALDO, P. and DOLIANA, R. 

Type A modules: interacting domains found in several non-fibrillar collagens and in other extracellular matrix pro- 
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teins. 

MATRIX 13 297-306(1993). 

0, PERKINS, S.J., SMITH, K.F., WILLIAMS, S.C., HARIS, P.I., CHAPMAN, D. and SIM, R.B. 
Tho eocondary stiucturo ol tho von Willobrand (actor typo A domain in (actor B of human complement by Fourier 
. transform infrared spectroscopy. 

Its occurrence in collagen types VI, VII, XII and XIV, the integrins and other proteins by averaged structure pre- 
dictions. 

J.MOL.BIOL 238 104-119 (1994). < 
4. BORK, P. and ROHDE, K. 

More von Willebrand factor type A domains? Sequence similarities with malaria thrombos pond in-related anony- 
mous protein, dihydropyridino-sensilive calcium channel and inter-alpha-lrypsin inhibitor. 
BIOCHEM.J. 279 908-910 (1991). 

■5. EDWARDS, Y.J.K. and PERKINS, S.J. 

Tho protein (old ol tho von Willobrand factor typo A domain is prodictod to bo similar to tho open twisted beta- 
sheet flanked by alpha-helices found in human ras-p21 . 
FEBS LETT. 358 283-286 (1995). 

6. LEE, J.O., RIEU, P., ARNAOUT, M.A. and UDDINGTON, R. 
Crystal structure of the A domain from the alpha subunit of integrin CR3 (CD11b/CD18). 
CELL 80 631-638 (1995). 

25 7. QU, A. and LEAHY, D.J. 

Crystal structure of the l-domain (rom the CD11a/CD18 (LFA-1, alpha L beta 2) integrin. 
PROC.NATLACAD.SCI.USA 92 10277-10281 (1995). 

[1622] The von Willebrand factor is a large multimeric glycoprotein found in blood plasma. Mutant forms are involved 
so in the aetiology of bleeding disorders [1 ]. In von Willebrand factor, the type A domain (vWF) is the prototype for a protein 
suporfamily Tho vWF domain is found in various plasma proteins; complement factors B, C2, CR3 and CR4; the 
integrins (l-domains); collagen types VI, VII, XII and XIV; and other extracellular proteins [2-4]. Proteins that incorporate 
vWF domains participate in numerous biological events (e.g., cell adhesion, migration, homing, pattern formation, and 
signal transduction), involving interaction with a large array of ligands [2]. Secondary structure prediction from 75 
35 aligned vWF sequences has revealed a largely alternating sequence of alpha-helices and beta-strands [3]. Fold rec- 
ognition algorithms were used to score sequence compatibility with a library of known structures: the vWF domain fold 
was predicted to be a doubly-wound, open, twisted beta-sheet flanked by alpha-helices [5]. 3D structures have been 
determined for the l-domains of integrins CD11b (with bound magnesium) [6] and CD11a (with bound manganese) [7]. 
The domain adopts a classic alpha/beta Rossmann fold and contains an unusual metal ion coordination site at its 
40 surface It has boon suggostod that this silo represents a general metal ion-dependent adhesion site (MIDAS) for 
binding protein ligands [6). The residues constituting the MIDAS motif in the CD11b and CD11a l-domains are com- 
pletely conserved, but the manner in which the metal ion is coordinated differs slightly [7]. 

[1623] VWFADOMAIN is a 3-element fingerprint that provides a signature for the vWF domain superiamily. The 
lingerprint was derived from an initial alignment of 14 sequences. Motif 1 includes the first beta-strand and 3 conserved 

as residues involved in metal ion coordination in l-domains (Asp and 2 serines in positions 8, 10 and 12, respectively); 
motif 2 spans strands beta-2 and beta-2'; and motif 3 encodes beta-strand 3 and a conserved Asp (in position 7), which 
coordinates the metal ion [6,7]. Three iterations on OWL27.0 were required to reach convergence, at which point a 
true set comprising 56 sequences was identified. Numerous partial matches were also found. 
[1624] 717. (WD40) WD domain, G-beta repeat 

so The ancient regulatory-protein family of WD-repeat proteins. 
Noor EJ, Schmidt CJ, Nambudripad R, Smith TF; 
Nature 1994;371:297-300. 

Beta-lransducin (G-beta) is one of the three subunits (alpha, beta, and gamma) of the guanine nucleotide-bmdmg 
proteins (G proteins) which act as intermediaries in the transduction of signals generated by transmembrane receptors 
55 [1]. The alpha subunit binds to and hydrolyzes GTP; the functions of the beta and gamma subunits are less clear but 
they seem to be required lor the replacement of GDP by GTP as well as for membrane anchoring and receptor rec- 
ognition. 

[1625] In higher eukaryotes G-beta exists as a small multigene family of highly conserved proteins of about 340 
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amino acid residues. Structurally G-beta consists of eight tandem repeats ot about 40 residues, each containing a 
central Trp-Asp motif (this type of repeat is sometimes called a WD-40 repeat). Such a repetitive segment has been 
shown [E1 ,2,3,4,5] to exist in a number of other proteins listed below: 

5 - Yeast STE4, a component of the pheromone response pathway. STE4 is a G-beta like protein that associates with 
GPA1 (G-alpha) and STE18 (G-gamma). 

Yeast MSI1, a negative regulator of RAS-mediated cAMP synthesis. MSI1 is most probably also a G-beta protein. 
Human and chicken protein 12.3. The function of this protein is not known, but on the basis of its similarity to G- 
beta proteins, it may also function in signal transduction. 
to - Chlamydomonas reinhardtii gblp. This protein is most probably the homolog of vertebrate protein 12.3. 
Human LIS1 , a neuronal protein involved in type-1 lissencephaly [E2]. 

Mammalian coatomer beta' subunit (beta'-COP), a component of a cytosolic protein complex that reversibly as- 
sociates with Golgi membranes to form vesicles that mediate biosynthetic protein transport. 

is - Yeast CDC4, essential for initiation of DNA replication and separation of the spindle polo bodios to form tho polos 
of the mitotic spindle. 

Yeast CDC20, a protein required for two microtubule-dependent processes: nuclear movements prior to anaphase 
and chromosome separation. 

Yeast MAK11 , essential for cell growth and for the replication of M1 double-stranded RNA. 
20 - Yeast PRP4, a component of the U4/U6 small nuclear ribonucleoprotein with a probable role in mRNA splicing. 
Yeast PWP1 , a protein of unknown function. 

Yeast SKI8, a protein essential for controlling the propagation of double-stranded RNA. 

- - Yeast SOF1, a protein required for ribosomal RNA processing which associates with U3 small nucleolar RNA. 

- Yeast TUP1 (also known as AER2 or SFL2 or CYC9), a protein which has been implicated in dTMP uptake, cat- 
25 abolite repression, mating sterility, and many other phenotypes. 

Yeast YCR57c, an ORF of unknown function from chromosome III. 
Yeast YCR72c, an ORF of unknown function from chromosome III. 

Slime mold coronin, an actin-binding protein. 
30 - Slime mold AAC3, a developmental^ regulated protein of unknown function. 

Drosophila protein Groucho (formerly known as E(spl); 'enhancer of split'), a protein involved in neurogenesis and 
that seems to interact with the Notch and Delta proteins. 
Drosophila TAF-ll-80, a protein that is tightly associated with TFIID. 

35 

[1626] The number of repeats in the above proteins varies between 5 (PRP4, TUP1 , and Groucho) and 8 (G-beta, 
STE4, MSI1, A AC 3, CDC4, PWP1, etc.). In G-beta and G-beta like proteins, the repeats span the entire length of the 
sequence, while in other proteins, they make up the N-terminal, the central or the C-terminal section. 
[1627] A signature pattern can be developed from the central core of the domain (positions 9 to 23). 

40 

- Consensus pattern: [LIVMSTAC]-[LIVMFYWSTAGC]-[LIMSTAG]-[LIVMSTAGC]-x(2)-[DN]- 
x(2)-[LIVMWSTAC]-x-[LIVMFSTAG]-W-[DEN]-[LIVMFSTAGCN] 

45 [ 1] Gilman A.G. 

Annu. Rev. Biochem. 56:615-649(1987). 

[ 2] Duronio R.J., Gordon J.I., Boguski M.S. 

Proteins 13:41-56(1992). 

[ 3] van der Voorn L, Ploegh H.L. 
so FEBS Lett. 307:131 134(1992). 

[ 4] Neer E.J., Schmidt C.J., Nambudripad R., Smith T.F. 

Nature 371:297^300(1994). 

[ 5] Smith T.F., Gaiatzes C.G., Saxena K., Neer E.J. 
Biochemistry In Press(1998). 

55 

[1628] 718. WHEP-TRS domain containing proteins ! 
A conserved domain of 46 amino acids has been shown [1] to exist in a number of higher eukaryote aminoacyl-transfer 
RNA synthetases. This domain is present one to six tim s in the following enzymes: 
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Mammalian multifunctional aminoacyl-tRNA synthetase. The domain is present thr e times in a region that sepa- 
rates tho N-lorminal glutamyMRNA synthetase domain from the C-terminal prolyMRNA synthetase domain. 
Drospphila multifunctional aminoacyl-tRNA synthetase. The domain is present six times in the intercatalylic region. 
Mammalian tryptophanyl-tRNA synthetase. Tho domain is found at the N-terminal extremity, 
s - Mammalian, insect, nematode and plant glycyl-tRNA synthetase. The domain is found at the N-terminal extremity 
. [2]. 

Mammalian histidyl-tRNA synthetase. The domain is found at the N-terminal extremity. 

[1629] This domain, which is called WHEP-TRS, could contain a central alpha-helical region and may play a role in 
w the association of tRNA-synthetases into multienzyme complexes. 

[1630] A signaluro pattorn based on the first 29 positions of the WHEP-Domain has been developed. 

- Consensus pattern; [OY]-G-[DNEA]-x-[LIV]-[KR]-x^ 

(3)-K 1 

is 

| 1] Corini C, Korifin P., Aotior M, ( Gralocos D. t Mirando M, ( Somoriva M. EMBO J. 10:4267-4277(1991). t 
[ 2] Nada S., Chang P.K., Dignam J.D. ' 
J. Biol. Chem. 268:7660-7667( 1 993). 

20 [1631] 719. (Worm family 8) Putative membrane protein 

Analysis of protein domain families in Caenorhabditis elegans. 
Sonnhammer EL, Durbin R; 
Gonomics 1997;46:200-216. 

This family called family 8 in [1], may be a transmembrane protein 
2S . The specific function of this protein is unknown. 
[1632] 720. Xylose isomerase 

Xyloso isomerase (EC 5.3.1.5) [1] is an enzyme lound in microorganisms which catalyzes the interconversion of D- 
xyloso to D-xylulose. It can also isomerize D-ribose to D-ribulose and D-glucose to D-f ructose. Xylose isomerase seems 
to require magnesium for its activity, while cobalt is necessary to stabilize the tetrameric structure of the enzyme. A 

30 number of residues are conserved in all known xylose isomerases. 

[1633] Xylose isomerase also exists in plants [2] where it is homodimeric and is manganese-dependent. 
[1634] Two signatures patterns for xylose isomerase have been developed. The first one is derived from a stretch 
of five conserved amino acids that includes a glutamic acid residue known to be one of the four residues involved in 
Iho binding of iho mi ignosium ion [3]; this pattorn also includes a lysino rosiduo which is involved in the catalytic activity. 

35 The second pattern is derived from a conserved region in the N-terminal section of the enzyme that include an histidine 
residue which has been shown [4] to be involved in the catalytic mechanism of the enzyme. 
Consensus pattern: [LI]-E-P-K-P-x(2)-P 
[E is a magnesium ligand] 
[K is an active site residue] 

40 

• Consensus pattern: [FL]-H-D-x-D-[LIV]-x-[PD]-x-[GDE] 
[H is an active site residue] 

45 [ 1] Dauter Z., Dauter M., Hemker J. ( Witzel H., Wilson K.S. 

FEBS Lett. 247:1-8(1989). 

[ 2] Kristo P. A., Saarelainen R., Fagerstrom R., Aho S., Korhola M. 
Eur. J. Biochem. 237:240-246(1 996). 
( 3] Henrick K., Collyer C.A., Blow D.M. 
so J. Mol. Biol. 208:129-157(1989). 

| 4] Vangrysperre W., Ampe C, Kersters-Hilderson H., Tempst P. 
Biochem. J. 263:195-199(1989). 

[1635] 721. XPG protein signatures. Xeroderma pigmentosum (XP) [1] is a human autosomal recessive disease, 
ss characterized by a high incidence of sunlight-induced skin cancer. People's skin cells with this condition are hypersen- 
sitive to ultraviolet light, due to defects in the incision step of DNA excision repair. There are a minimum of seven 
genetic complementation groups involved in this pathway: XP-A to XP-G. The defect in XP-G can be corr cted by a 
133 Kd nuclear protein called XPG (or XPGC) [2].XPG belongs to a family of proteins [2,3,4,5,6] that are composed 
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of twomain subsets: - Subset 1 , to which belongs XPG, RAD2 from budding yeast and radl3 from fission yeast. RAD2 
and XPG are single-stranded DNA endonucleases [7,8]. XPG makes the 3'incision in human DNA nucleotide excision 
repair [9]. - Subset 2, to which belongs mouse and human FEN-t , rad2 from fission yeast, and RAD27 from budding 
yeast. FEN-1 is a structure-specific endonuclease. In addition to the proteins listed in the above groups, this family 

s also includ s: - Fission yeast exol, a 5*->3' double-stranded DNA exonuclease that could act in a pathway that corrects 
mismatched base pairs. - Yeast EXOI (DHS1 ), a protein with probably the same function as exol. - Yeast DIN7;Se-. 
quence alignment of this family of proteins reveals that similarities are largely confined to two regions. The first is 
located at the N-terminal extremity (N-region) and corresponds to the- first 95 to 105 amino acids. The second region 
is internal (l-region) and found towards the C-terminus; it spans about 140 residues and contains a highly conserved 

io core of 27 amino acids that includes a conserved pentapeptide (E-A-[DE]-A-[QS)). It is possible that the conserved 
acidic residues are involved in the catalytic mechanism of DNA excision repair in XPG. The amino acids linking the N- 
and I -regions are not conserved; indeed, they are largely absent from proteins belonging to the second subset. Two 
signature patterns have been developed for these proteins. The first corresponds to the central part of the N-region, 
the second to part of the l-region and includes the putative catalytic core pentapeptide 

is [1636] Consensus pattern: [VI]-[KRE]-P-x-[FYIL]-V-F-D-G-x(2)-[PIL]-x-[LVC]-K- 

Consensus pattern: [GS]-[LIVM]-[PER]-[FYS]-[LI VM]-x-A-P-x-E-A-[DE]-[PAS]- [QS]-[CLM]- 

[1637] [ 1]Tanaka K.," Wood R.D. Trends Biochem. Sci. 1 9:83-86(1 994).| 2] Scherly D., Nouspikel T, Corlet J., Ucla 
C, Bairoch A., Clarkson S.G. Nature 363: 182-1 85(1 993). [ 3] Carr A.M., Sheldrick K.S., Murray J.M., Al-Harithy R., 
Watts F.Z., Lehmann A.R. Nucleic Acids Res. 21:1 345-1 349(1 993).[ 4] Murray J.M., Tavassoli M. ( Al-Harithy R., Shel- 

20 drick K.S., Lehmann A.R., Carr A.M., Watts F.Z. Mol. Cell. Biol. 14:4878-4888(1 994).[ 5] Harrington J.J., Lieber M.R. 
Genes Dev. 8: 1344-1 355(1 994).[ 6) Szankasi P, Smith G.R. Science 267:1166-1169(1 995).[ 7] Habraken Y, Sung P, 
Prakash L, Prakash S. Nature 366:365-368(1 993).[ 8] O'Donovan A., Scherly D., Clarkson S.G., Wood R.D. J. Biol. 
Chem. 269: 15965-1 5968(1 994). [ 9] O'Donovan A., Davies A.A., Moggs J.G.. West S.C., Wood R.D. Nature 371: 
432-435(1994). 

26 [1638] 722. Xanthine/uracil permeases family 

The following transport proteins which are involved in the uptake of xanthine or uracil are evolutionary related [1]: 

Uric uric acid-xanthine permease (gene uapA) from Aspergillus nidulans. 

Purine permease (gene uapC) from Aspergillus nidulans. 
30 - Xanthine permease from Bacillus subtilis (gene pbuX). 

Uracil permease from Escherichia coli (gene uraA) [2] and Bacillus (gene pyrP). 

Hypothetical protein ycdG from Escherichia coli. 

Hypothetical protein ygfO from Escherichia coli. 

Hypothetical protein ygfU from Escherichia coli. 
35 . Hypothetical protein yicE from Escherichia coli. 

Hypothetical protein yunJ from Bacillus subtilis. 

Hypothetical protein yunK from Bacillus subtilis. 

[1639] They are proteins of from 430 to 595 residues that seem to contain 12 transmembrane domains. 
40 The best conserved region which corresponds with what seems to be the tenth transmembrane domain has been 
selected as a signature pattern. 

- Consensus pattern: [LIVM]-P-x-IPASIF]-V-[LIVM]-G-G-x(4)-[LIVM]-[FY]-(GSA]-x-[LIVM]-x(3)-G 

45 [ 1] Diallinas G., Gorfinkiel L, Arst G., Cecchetto G. f Scazzocchio C. J. Biol. Chem. 270:8610-8622(1995). 

[ 2] Andersen P.S., Frees D. ( Fast R. ( Mygind B. J. Bacteriol. 177:2008-2013(1995). 

[1640] 723. Hypothetical yabO/yceC/sfhB family 

The following proteins, which seems to belong to a family of pseudouridine synthases (EC 4.2.1.70) [1] have been 
so shown to share regions of similarities: 

Escherichia coli and Haemophilus influenzae ribosomal large subunit pseudouridine synthase A (gene rluA). It is 
responsible for synthesis of pseudouridine from uracll-746 IN 23S rRNA. 

Escherichia coli and Haemophilus influenzae ribosomal large subunit pseudouridine synthase C (gene rluC). It is 
ss responsible for synthesis of pseudouridine from uracil at positions 955, 2504 and 2580 in 23S rRNA. 

Escherichia coll protein and homolbgs In other bactorla laige subunit psoudourldlno synthase D (gone rluD).* 
Yeast DRAP deaminase (gene RIB2). 

Escherichia coli hypothetical protein yqcB and H1 1435, the corresponding Haemophilus Influenzae protein. 
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Haemophilus influenzae hypothetical protein HI0042. 

Aquifex aeolicus hypothetical protein AQ_1 758. 

Bacillus subtilis hypothetical protein yhcT. 

Dnclllim nuhtilln hypothollcnl pfololn yjbO, 
f> - Bncillue eublilie hypothetical protoln ylyB, ( 
- . Helicobacter pylori hypothetical protein HP0347. 

Helicobacter pylori hypothetical protein HP0745. 

Helicobacter pylori hypothetical protein HP0956. 

Mycoplasma genitalium hypothetical protein MG209. 
w - Mycoplasma genitalium hypothetical protein MG370. 

Synechocystis strain PCC 6803 hypothetical protein slrl592. 

Synechocystis strain PCC 6803 hypothetical protein slrl629. 

- Yoa6t hypothetical protein YDL036c. 

- Yeast hypothetical protein YGR169c. 

is - Fission yeast hypothetical protein SpAC18B11 .02c. 

Crionorhabditio ologans hypothetical protein K07E8. 7. 

[1641] These are proteins of from 21 to 50 Kd which contain a number of conserved regions in their central section. 
Thoy can be picked up in the database by the following highly conserved pattern. 

20 

- Consensus pattern: [LIVCAMNHYT]-R-[LI]-D.x(2)-T-[STA]-G-[LIVAGC]-[LIVMF](2)-[LIVMFGC]-[SGTACV] 

[1642] [ 1] Conrad J., Sun D., Englund N., Olengand J. J. Biol. Chem. 273:18562-18566(1998). 
[1643] In addition, the following bacterial proteins, which seems to belong to a family of pseudouridine synthases 
2S (EC 4.2.1.70) [1] also have been shown to share regions of similarities: 

Escherichia coli and Haemophilus influenzae 1 6S pseudouridylate 516 synthase (EC 4.2. 1 .70) (gene: rsuA). This 

onzymp lo tooponolblo lor tho loimnllon ol psoudourldino fiom uracil-516 in 16S ribosomal RNA. 

Escherichia coli hypothetical protein yciL and H1 11 99, the corresponding Haemophilus influenzae protein. 
30 . Escherichia coli hypothetical protein yjbC. 

Escherichia coli hypothetical protein ymfC and HI0694 f the corresponding Haemophilus influenzae protein. 

Aquifex aeolicus hypothetical protein AQ_554. 

Aquifex aeolicus hypothetical protein AQ_1464. 

Bacillus subtilis hypothetical protein ypuL. 
3S - Bacillus subtilis hypothetical protein ytzR 

Borrelia burgdorferi hypothetical protein BB0129. 

Helicobacter pylori hypothetical protein HP 1459. 

Synechocystis strain PCC 6803 hypothetical protein slr0361 . 

Synechocystis strain PCC 6803 hypothetical protein slr0612. 

40 

[1644] These are proteins of from 25 to 40 Kd which contain a number of conserved regions in their central section. 
They can be picked up in the database by the following highly conserved pattern. 

- Consensus pattern: G-R-L-D-x(2)-[STA]-x-G-[LIVFA]-[LIVMF](3)-[ST]-[DNST] 

45 

[1645] | 1) Wrzosinski J., Bakin A., Nurso K., Lane B.G., Olongand J. Biochemistry 34:8904-8913(1995). 
[1646] 724. Zinc finger present in dystrophin, CBP/p300 
ZZ in dystrophin binds calmodulin 
Putative zinc finger; binding not yet shown. 
so [1647] 725. Zinc carboxy peptidase 

There are a number of different types of zinc-dependent carboxypeptidases (EC 3.4.17.-) [1,2]. All these enzymes 
seem to be structurally and functionally related. The enzymes that belong to this family are listed below. 

- Carboxypeptidase A1 (EC 3.4.17.1), a pancreatic digestive enzyme that can removes all C-terminal amino acids 
55 with the exception of Arg, Lys and Pro. 

Carboxypeptidase A2 (EC 3.4.17.15), a pancreatic digestive enzyme with a specificity similar to that of carbox- 
ypeptidase A1, but with a preference for bulkier C-terminal residues. 

Carboxypeptidase B (EC 3.4.17.2), also a pancreatic digestive enzyme, but that preferentially removes C-terminal 
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EMBOJ 2:1009-1014(19B3). H R7^ are an extensive group ot hormones. 
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. 5-hydroxytryptamirie (serotonin) 1 A to ^F, 2A to 2C, 4, 5A, 
. Acetylcholine, muscarin-c-type M to MB. 

. Adenosine A1 , A2A A2B , sne * _ 2D; ^ (Q .3 m . 
. Adrenergic alpha-1 A to -iC.aipna 

. Angiotensin II types I and III. 
. Bombesin subtypes 3 and 4. 

. Bradykinin B1 and B2. 

C 3a and C5a anaphylatoxin. 
. CannabinoidCBtandCB2. KR . 8 
. ChemokmesC-CCCCNM CKR . 4 
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Follicle stimulating hormone (FSH R) P«l 
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Odorants 111]- .w rt ac!i91 
. opioids delta, kappa- and mu-types 112]. 

. Oxytocin (OT-R). _ 
. Platelet activating lactor (PAF-R). 
Prostacyclin. 

- Prostaglandin F2. 

. Purinoreceptors (ATP) [1 3]. 

. Somatostatin types 1 to 5. \ 

55 . Substance-K (NK-2R). . . . . 

. Substance-P(NK-IR)- 

. Thrombin. 

Thromboxane A2. 
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integral membrane proteins with seven transmembran regions that belong to family 1 of G-protein coupled receptors. 
[1 685] In vertebrates four different pigments are generally found. Rod cells, which mediate vision in dim light, contain 
the pigment rhodopsin. Cone cells, which function in bright light, ar responsible for color vision and contain three or 
more color pigments (for example, in mammals: red, blue and green). 

[1686] In Drosophila, the eye is composed of 800 facets or ommatidia. Each ommatidium contains eight photore- 
ceptor cells (RT-R8): the R1 to R6 cells are outer cells, R7 and R8 inner cells. Each of the three types of cells (RT-R6, 
R7 and R8) expresses a specific opsin. 

[1687] Proteins evolutionary related to opsins include squid retinoGhromo, also known as retinal photoisomoraso, 
which converts various isomers of retinal into 11-cis retinal and mammalian retinal pigment epithelium (RPE) RGR [3], 
a protein that may also act in retinal isomerization. 

[1688] The attachment site for retinal in the above proteins is a conserved lysine residue in the middle of the seventh 
transmembrane helix. The pattern that had been developed includes this residue. 

- Consensus pattern: [LIVMWAC]-[PGC]-x(3)-[SAC]-K-|STALIMR]-[GSACPNV]-[STACP)-x(2)-[DENF]-[AP]-x(2)- 
[IY] 

[K is the retinal binding site] 

[ 1] Applebury M.L., Hargrave PA. 
Vision Res. 26:1881-1895(1986). 
[ 2] Fryxell K.J., Meyerowitz E.M. 
J. Mol. Evol. 33:367-378(1991). 

[ 3] Shen D., Jiang M., Hao W., Tao L, Salazar M., Fong H.K.W, 
Biochemist ry 33: 1 31 1 7- 1 31 25( 1 994). 

[1689] The following descriptions of protein family functions are not provided by the Pfam or Prosite databases. 
[1690] 740. BAH 

BAH domain. Number of members: 65 

[1] Medline: 97074677. Molecular cloning of polybromo, a nuclear protein containing multiple domains including 
five bromodomains, a truncated HMG-box, and two repeats of a novel domain. Nicolas RH, Goodwin GH; Gene 
1996;175:233-240. 

[2] Medline: 991 98739. The BAH (bromo-adjacent homology) domain: a link between DNA methylation, replication 
and transcriptional regulation. Callebaut I, Courvalin J-C, Mornon JP; FEBS lefts 1999;446:189-193. 

[1691] 741.ELM2. 

ELM2 domain. The ELM2 (Egl-27 and MTA1 homology 2) domain is a small domain of unknown function. Number of 
members: 10 

[1692] 742. Euk proin. EUKARYOTIC^PORIN The major protein of the outer mitochondrial membrane of eukaryotes 
is a porin that forms a voltage -dependent an ion-selective channel (VDAC) that behaves as a general diffusion pore 
for small hydrophilic molecules [1 to 4]. The channel adopts an open conformation at low or zero membrane potential 
and a closed conformation at potentials above 30-40 mV. 

This protein contains about 280 amino acids and its sequence is composed of botwoon 1 2 to 1 6 bota-strands that span 
the mitochondrial outer membrane. Yeast contains two members of this tamily (gonos POR1 and POR2); vertobratos 
have at least three members (genes VDAC1, VDAC2 and VDAC3) [5]. 

A conserved region located at the C-terminal part of these proteins was selected as a signature pattern. 
Consensus pattern[YH]-x(2)-D-[SPCAD]-x-[STA]-x(3)-[TAG]-[KRHLIVMF]-|DNSTA]-[DNS]-x(4)-[GSTAN]-[LIVMA]-x- 
[LIVMY] 

[ 1] Benz R. Biochim. Biophys. Acta 1197:167-196(1994). 
[ 2] Manella C.A. Trends Biochem. Sci. 17:315-320(1992). 
[ 3] Dihanich M. Experientia 46:146-153(1990). 

[4] Forte M., Guy H.R., Mannella C.A. J. Bioenerg. Biomembr. 19:341-350(1987). 

[ 5] Sampson M.J., Lovell R.S., Davison D.B., Craigen W.J. Genomics 36:192-196(1996). 

[1693] 743. Glycohydor 19 

Chitinases family 19 signatures i 
cross-ref rence(s) CHITINASE_19_1, CHITINASE_19_2 

Chitinases (EC 3.2.1.14) [1] are enzymes that catalyze the hydrolysis of the beta-1 ,4-N-acetyl-D-glucosamine linkages 


260 


10 


/6 


20 


EP1 033405 A2 ^^BorlQ 

Fom the viewpoint 

in chilin polymer* From he view p o| „ y 19 ^ by do6Uoymg J° f c ^ jnding dom ain 

, ica tlon ol glycosyl n tl ninot lun n ni "^Ence * a N-""™^^ 230 amin0 
"«» ^ '"S on- ymos dilter in >no P'ff^* <* th*e enzymes cons.st o. about 
wall. Class IA/1 ^ '^iroC00025>). The cata.yt.c domam 

(see the relevant entry <PUU one js )ocaled in the" b , 

— * — 

and contains one ot thos.xcys ,YFl-x(2)-F-lGSA] 
I HFIach J.. Pi** P " E - J ° j 280 309-3l6(1991). 

SSWKTS-" — » — 

l21Medl.ne. 99 l58 ^° voni N , SZ y1 M; Nature i999.397.57 
Bamchandam S. cervo. 

=~-.-s==- : ---;rr. -.r 

Vertebrate calpams (EC 3*2* , inal calc.um-b nd.ng dom ^ 

a terminal ca»«lri« ^J^. involved in osteoclast bone glycope ptide). 
Mamma lian catheps.n K, wh.ch . nBClivBlio n ol the antitumor drug BLM £ S 1 ^^ acli nidin 

Human c.thops.n O HV lnal catalyzes the ^cUvg.on ^ ^ bean SH-EP, k. ^ ^ 

Bleomycin hydrolase. An on*y E P-B1/B4, kianey C ancain ( EC ^ " ' 32V rape 

COT44; riceoryzainalpha.be , rDr 3 cpr-4. cpr-5 and cpr- 

. ^uSS mites .lergens O^ 

. CathepsinB-likeo^ 

6), schistosoma manson. (ant.g anrf cp . 3) 
- AC-2),andOstertag.ac«tertag.tU p ^ ndcp2 

. slime mold oy^Jno^Z c,u*l «nd brucol. plasmodiom species. 


25 


30 


40 


AS 


261 


EP 1 033 405 A2 


Drosophila small optic lobes protein (gene sol), a neuronal protein that contains a calpain-like domain. 
- Yeast thiol protease BLH1/YCP1/LAP3. 

Caenorhabditis elegans hypothetical protein C06G4.2, a calpain-like protein. ■ , j 

5 [1697] Two bacterial peptidases are also part of this family: 

Aminopeptidase C from Lactococcus lactis (gene pepC) [5]. 
Thiol protease tpr from Porphyromonas gingivalis. 

io [1698] Three other proteins are structurally related to this family, but may have lost their proteolytic activity. 

Soybean oil body protein P34. This protein has its active site cysteine replaced by a glycine. 
Rat testin, a Sertoli cell secretory protein highly similar to cathepsin L but with the active site cysteine is replaced 
by a serine. Rat testin should not be confused with mouse testin which is a LIM-domain protein (see 
is <PDOC00382>). 

Plasmodium falciparum serine-repeat protein (SERA), the major blood stage antigen. This protein of 111 Kd pos- 
sesses a C-terminal thiol-protease-like domain [6], but the active site cysteine is replaced, by a serine. 

The sequences around the three active site residues are well conserved and can be used as signature patterns. 
20 [1699] Consensus patternQ-x(3)-[GE]-x-C-[YW]-x(2)-ISTAGC]-[STAGCV] [C is the active site residue] 

Note the residue in position 4 of the pattern is almost always cysteine; the only exceptions are calpains (Leu), bleomycin 
hydrolase (Ser) and yeast YCP1 (Ser). Note the residue in position 5 of the pattern is always Gly except in papaya 
protease IV where it is Glu. Consensus pattern[LIVMGSTAN]-x-H-[GSACE]-[LIVM]-x-[LIVMAT](2)-G-x-[GSADNH] [H 
is the active site residue] 

25 Consensus pattern[FYCH]-[WI]MHVT]-x-[KRQAG]-N-[ST]-W-x(3)-[FYW]-G-x(2j-G-[LFYW]-[LIVMFY [N 
is the active site residue] 

Note these roteins belong to family C1 (papain-type) and C2 (calpains) in the classification of peptidases [7.E1]. 

[ 1]Dufour E. Biochimie 70:1335-1342(1988). 
30 [ 2]Kirschke K, Barrett A.J., Rawlings N.D. Protein Prof. 2:1587-1643(1995). 

[ 3]Shi G.-P., Chapman H.A., Bhairi S.M., Deleeuw C, Reddy V.Y, Weiss S.J. FEBS Lotl. 357:129-134(1995). 
[ 4]VelascoG. ( Ferrando A.A., Puente X.S., Sanchez L.M., Lopez-Otin C. J. Biol. Chem. 269:27136-27142(1994). 
( 5]Chapot-Chartier M.P, Nardi M., Chopin M.C., Chopin A, Gripon J.C. Appl. Environ. Microbiol. 59:330-333 
(1993). 

35 [ 6]Higgins D.G. ( McConnell D.J., Sharp P.M. Nature 340:604-604(1 989). 

[7]Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:461-486(1994). 

[1700] 746. Peptidase M22 

Glycoprotease family signature cross-reference(s) GLYCOPROTEASE 
40 Glycoprotease (GCP) (EC 3.4.24.57) [1], or o-syaloglycoprotein endopeptidase, is a motalloprolease secreted by Pas- 
teurella haemolytica which specifically cleaves O-sialoglycoproteins such as glycophorin A. The sequence of GCP is 
highly similar to the following uncharacterized proteins: 

Escherichia coli hypothetical protein ygjD (ORF-X). 
<*5 - Bacillus subtilis hypothetical protein ydiE. 

Mycobacterium loprao hypothetical protein U229E. 

Mycobacterium tuberculosis hypothetical protein MtCY78.10. 

Synechocystis strain PCC 6803 hypothetical protein slr0807. 

Methanococcus jannaschii hypothetical protein MJ1130. 
50 - Haloarcula marismortui hypothetical protein in HSH 3'region. 

Yeast hypothetical protein YKR038C. 

Yeast hypothetical protein QRI7. 

[1701] One of the conserved regions contains two conserved histidines. It is possible that this region is involved in 
55 coordinating a metal ion such as zinc. 

[1702] Consensus pattern[KR]-[GSAT]-x(4)-|FYWLH]-[DQNGK]-x-P-x-[LIVMFY]-x(3)-H-x(2)-[AG>H-[LIVM] ' 
Note these proteins belong to family M22 in the classification of peptidases [2,E1]. 
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1 1|Alxlullnti K,M„ IQ R.Y,C, ( M§llor§ A, J, Bacteriol, 173:5597-5603(1991), 
I 2)Rawlings N.D., Barrett AJ. Moth. Enzymol. 248:183-228(1995). 

I 

[1703] 747. SAM. SAM domain (Sterile alpha motif) 
s it has been suggested that SAM is an evolutionarily conserved protein binding domain that is involved in the regulation 
ol-numorouo developmental processes in divorso oukaryotos, The SAM domain can potentially function as a protein 
interaction module through its ability to homo- and heterooligomerise with other SAM domains. Number of members: 81 

[1 JModlino: 96100659 SAM: A novel motif in yoasl storilo alpha and Drosophila polyhomeotic proteins Ponting CP; 
/o Prot Sci 1995;4:1928-1930. 

|2]Modlino: 97160498 SAM as a protein interaction domain involved in developmental regulation. Shultz J, Ponting 
CP, Hofmann K, Bork P; Prot Sci 1997;6:249-253. 

[3]Medline* 99101382 The crystal structure of an Eph receptor SAM domain reveals a mechanism for modular 
dimorizalion. Reference Author: Stapleton D, Balan I, Pawson T, Sicheri F; Nat Struct Biol 1999;6:44-49. 
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25 


[1704] 748. Tyrosinase signatures cross-reterence(s) TYROSINASES : TYROSINASE_2 Tyrosinase (EC 1.14,18:1) 
[1] is a copper monooxygenases that catalyzes the hydroxylation of monophenols and the oxidation ol o-diphenols to 
o-quinols. This enzyme, found in prokaryotes as well as in eukaryotes, is involved in the formation of pigments such 
as melanins and other polyphenols compounds. 

[1 705] Tyrosinase binds two copper ions (CuA and CuB). Each of the two copper ion has been shown [2] to be bound 
by throo consorvod hlstidlnos residues. Tho rogions around Ihoso coppor-binding tigands are woll conserved and also 
shared by some hemocyanins, which are copper-containing oxygen carriers from the hemolymph of many molluscs 
and arthropods [3,4]. 

[1706] At least two proteins related to tyrosinase are known to exist in mammals: 

. TRP . t (TYRP1) [G], which lo msponoiblo lor tho conversion of 5 ,6-dihydroxy indole -2-carboxy lie acid (DHICA) to 
indole-5,6-quinone-2-carboxylic acid. 

- TRP-2 (TYRP2) [6], which is the melanogenic enzyme DOPAchrome tautomerase (EC 5.3.3.12) that catalyzes 
the conversion of DOPAchrome to DHICA. TRP-2 differs from tyrosinases and TRP-1 in that it binds two zinc ions 

30 instead of copper [7]. 

[1707] Other proteins that belong to this family are: 

- Plants polyphenol oxidases (PPO) (EC 1.10.3.1) which catalyze the oxidation of mono- and o-diphenols to o- 

35 diquinones [8]. 

Caenorhabditis elegans hypothetical protein C02C2. 1 

[1708] Two signature patterns tor tyrosinase and related proteins have been derived The first one contains two of 
the histidines that bind CuA, and is located in the N-terminal section of tyrosinase. The second pattern contains a 
40 histidine that binds CuB, that pattern is located in the central section of the enzyme. 

Consensus pattern H-x(4,5)-F-[LIVMFTP]-x-[FW] -H-R-x(2)-[LM]-x(3)-E [The two H's are copper Jigands] 
[1709] Consensus patternD-P-x-F-[LIVMFYW]-x(2)-H-x(3)-D [H is a copper ligand] 

[ 1]Lerch K. Prog. Clin. Biol. Res. 256:85-98(1988). 
45 [ 2]Jackman M.P., Hajnal A., Lerch K. Biochem. J. 274:707-713(1991). 

[ 3]Linzen B. Naturwissenschaften 76:206-211(1989). 

[ 4]Lang W.H.. van Holde K.E, Proc. Natl. Acad. Sci. U.S.A. 88:244-248(1991). 

[ 5]Kobayashi T, Urabe K., Winder A. ( Jimenez-Cervantes C, Imokawa G., Brewington T, Solano F., Garcia- 
Borron J.C., Hearing V.J. EMBO J. 13:5818-5825(1994). 
50 [ 6]Jackson I.J.. Chambers D.M., Tsukamoto K., Copeland N.G., Gilbert D.J., Jenkins N.A., Hearing V. EMBO J. 

11:527-535(1992). 

[7]Solano F. ( Martinez-Liarte J.H., Jimenez-Cervantes C, Garcia-Borron J.C., Lozano J.A. Biochem. Biopnys. 
Res: Commun. 204:1243-1250(1994). 

[ 8]Cary J.W., Lax A.R., Flurkey W.H. Plant Mol. Biol. 20:245-253(1992). 
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[1 71 0] 749. (Mur Ligase) Folylpolyglutamate synthase signatures 

Folylpolyglutamate synthase (EC 6.3.2.17) (FPGS) [1] is the enzyme of folate metabolism that catalyzes ATP-depend- 
ent addition of glutamate moieties to tetrahydrofolate. 
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[1711] Its sequence is moderately conserved between prokaryotes (gene fo!C) and eukaryotes. W developed two 
signature patterns based on the conserved regions which are rich in glycin residues and could play a role in the 
catalytical activity and/or in substrate binding. 
Description of pattern(s) and/or profile (s) 
s Consensus pattern[LIVMFY]-x-[LIVM]-[STAG] 

Consensus pattern[LIVMFY](2)-E-x-G.[LIVM]-[GA]-G-x(2)-D-x-[GST]-x-[LIVM](2) 

[1712] [ 1]Shane B. ( GarrowT., Brenner A., Chen L, Choi YJ., Hsu J.C., Stover P. Adv. Exp. Med. Biol. 338:629-634 
(1993). 

[1713] 750. (Peptidase M3) Neutral zinc metallopeptidases, zinc-binding* region signature 
io The majority of zinc-dependent metallopeptidases (with the notable exception of the carboxypeptidases) share a com- 
mon pattern of primary structure [1 ,2.3] in the part of their sequence involved in the binding of zinc, and can be grouped 
together as a supertamily,known as the metzincins, on the basis of this sequence similarity. They can be classified into 
a number of distinct families [4 ( E1] which are listed below along with the proteases which are currently known to belong 
to these families. 
is [1714] Family M1 

- Bacterial aminopep'tidase N (EC 3.4.11.2) (gene pepN). 
Mammalian aminopeptidase N (EC 3.4.11.2). 

Mammalian glutamyl aminopeptidase (EC 3.4.11.7) (aminopeptidase A). It may play a role in regulating growth 
20 and differentiation of early B-lineage cells. 

Yeast aminopeptidase yscll (gene APE2). 

Yeast alanine/arginine aminopeptidase (gene AAP1). 

Yeast hypothetical protein YIL1 37c. 

Leukotriene A-4 hydrolase (EC 3.3.2.6). This enzyme is responsible for the hydrolysis of an epoxide moiety of 
25 LTA-4 to form LTB-4; it has been shown that it binds zinc and is capable of peptidase activity 

[1715] Family M2 

Angiotensin-converting enzyme (EC 3.4.15.1) (dipeptidyl carboxypeptidase 1) (ACE) the enzyme responsible for 
30 hydrolyzing angiotensin I to angiotensin II. There are two forms of ACE: a testis-specific isozyme and a somatic 

isozyme which has two active centers. 

[1716] Family M3 

35 - Thimet oltgopeptidase (EC 3.4.24.15), a mammalian enzyme involved in the cytoplasmic degradation of small 
peptides. 

Neurolysin (EC 3.4.24.16) (also known as mitochondrial oligopeptidase M or microsomal endopoptidase). 
Mitochondrial intermediate peptidase precursor (EC 3.4.24.59) (MIP). It is involved the second stage of processing 
of some proteins imported in the mitochondrion. 
40 - Yeast saccharolysin (EC 3.4.24.37) (proteinase yscD). 

Escherichia coli and related bacteria dipeptidyl carboxypeptidase (EC 3.4.15.5) (gene dcp). 
Escherichia coli and rolated bacteria oligopeptidaso A (EC 3.4.24.70) (gono opdA or prIC). 
Yeast hypothetical protein YKL1 34c. 

45 [1717] Family M4 

Thermostable thermolysins (EC 3.4.24.27), and related thermolabile neutral proteases (bacillolysins) (EC 
3.4.24.28) from various species of Bacillus. 

Pseudotysin (EC 3.4.24.26) from Pseudomonas aeruginosa (gene lasB). 
50 - Extracellular elastase from Staphylococcus epidermidis. 

Extracellular protease prt1 from Erwinia carotovora. 

Extracellular minor protease smp from Serratia marcescens. 

Vibriolysln (EC 3.4.24.25) from various species of Vibrio. 

Protoaso prt A from Llstorla monocytogenes. 
55 - Extracellular proteinase proA from Legionella pneumophila. ^ 

[1718] Family M5 
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. Mycolyoin (EC 3,4,24.31 ) from Streptomyces cacaoL 
ir/ioj itrtinily MG 

s - Immune inhibitor A from Bacillus thuringiensis (gene ina). Ina degrades two classes of insect antibacterial proteins, 
- attacins and cecropins. 

[1720] Family M7 
io . Streptomyces extracellular small neutral proteases 

[1721] Family M8 

- LolBhmanolysin (EC 3.4,24.36) (surface glycoprotein gp63) ( a cell'surface protease from various species of Leish- 
is mania. 

[1722] Family M9 - 

- Microbial collagenase (EC 3.4.24.3) from Clostridium perlringens and Vibrio alginolylicus. 
[1723] Family M10A 


20 


Serralysin (EC 3,4.24.40), an extracellular metalloprotease from Serratta. 
Alkaline metalloproteinase from Pseudomonas aeruginosa (gene aprA). 
25 - Secreted proteases A, B, C and G from Erwinia chrysanlhemi. 
Yoast hypolhotical protein YIL108w. 

[1724] Family M10B 

30 - Mammalian extracellular matrix metalloproteinases (known as matrixins) [5]: MMP-1 (EC 3.4.24.7) (interstitial col- 
laqenase) MMP-2 (EC 3.4.24.24) (72 Kd gelatinase), MMP-9 (EC 3.4.24.35) (92 Kd gelatinase), MMP-7 (EC 
3 A 24 23)' (mntrylioin), MMP-8 (EC 3.4.24.34) (noutrophil collagonase), MMP-3 (EC 3.4.24.17) (stromelysm-1), 
MMP-10 (EC 3.4.24.22) (stromelysin-2), and MMP-11 (stromelysin-3), MMP-12 (EC 3.4.24.65) (macrophage met- 

alloelastase). . t 

35 . Sea urchin hatching enzyme (envelysin) (EC 3.4.24.12). A protease that allows the embryo to digest the protective 

envelope derived from the egg extracellular matrix. 
Soybean metalloendoproteinase 1. 


40 


[1725] Family M11 

Chlamydomonas reinhardtii gamete lytic enzyme (GLE). 

[1726] Family M12A 

45 - Astacin (EC 3.4.24.21), a crayfish endoprotease. 

- Meprin A (EC 3 4 24 18), a mammalian kidney and intestinal brush border metalloendopeptidase. 

- Bone morphogenic protein 1 (BMP-1 ), a protein which induces cartilage and bone formation and which expresses 
metalloendopeptidase activity. The Drosophila homolog of BMP-1 is the dorsal-ventral patterning protein tolloid. , 

- Blastula protease 10 (BP10) from Paracentrotus lividus and the related protein SpAN from Strongylocentrotus 
50 purpuratus. 

Caenorhabditis elegans protein toh-2. 

- Caenorhabditis elegans hypothetical protein F42A10.8. 

- Choriolysins L and H (EC 3.4.24.67) (also known as embryonic hatching proteins LCE and HCE) from the tisn 
1 Oryzias lapides. These proteases participates in the breakdown of the egg envelope, which is derived trom the 

55 egg extracellular matrix, at the time of hatching. 

[1727] Family M12B 
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Snake venom metalloproteinases [6]. This subfamily mostly groups proteases that act in hemorrhage. Examples 
are: adamalysin II (EC 3.4.24.46), atrolysin C/D (EG 3.4.24.42), atrolysin E (EC 3.4.24.44), fibrolase (EC 3.4.24.72), 
trimerelysih I (EC 3.4.25.52) and II (EC 3.4,25.53). 
Mouse cell surface antigen MS2. 

[1728] Family M1 3 

Mammalian neprilysin (EC 3.4.24.11) (neutral endopeptidase) (NEP). 

Endothelin-converting enzyme 1 (EC 3.4.24.71 ) (ECE-1 ), which process the precursor of endothelin to release the 
active peptide. 

- Kell blood group glycoprotein, a major antigenic protein of erythrocytes. The Kell protein is very probably a zinc 
endopeptidase. 

Peptidase O from Lactococcus lactis (gene pepO). 
[1729] Family M27 

- Clostridial neurotoxins, including tetanus toxin (TeTx) and the various botulinum toxins (BoNT). These toxins are 
zinc proteases that block neurotransmitter release by proteolytic cleavage of synaptic proteins such as synapto- 
brevins, syntaxin and SNAP-25 [7,8]. 

[1730] Family M30 

Staphylococcus hyicus neutral metallop rot ease. 

[1731] Family M32 

Thermostable carboxypeptidase 1 (EC 3.4.17.19) (carboxy peptidase Taq), an enzyme from Thermus aquaticus 
which is most active at high temperature. 

[1732] Family M34 

Lethal factor (LF) from Bacillus anthracis, one of the three proteins composing the anthrax toxin. 
[1733] Family M35 

Deuterolysin (EC 3.4.24.39) from Penicillium citrinum and related proteases from various species of Aspergillus. 

[1734] Family M36 

Extracellular elastinolytic metalloproteinases from Aspergillus. 

[1 735] From the tertiary structure of thermolysin, the position of the residues acting as zinc ligands and those involved 
in the catalytic activity are known. Two of the zinc ligands are histidines which are very close together in the sequence; 
C-terminal to the first histidine is a glutamic acid residue which acts as a nucleophile and promotes the attack of a 
water molecule oh the carbonyl carbon of the substrate. A signature pattern which includes the two histidine and the 
glutamic acid residues is sufficient to detect this superfamily of proteins. 
[1736] Description of pattern(s) and/or profile(s) 

Consensus pattern[GSTALIVN]-x(2)-H-E-ILIVMFYW)-{DEHRKP}-H-x-[LIVMFYWGSPQ] [The 

two H's are zinc ligands] [E is the active site residue] 

Sequences known to belong to this class detected by the patternALL, 

except for members of families M5, M7 amd M11. 

Other sequence(s) detected in SWISS-PROT55; including Neurospora 

crassa conidiation-specific protein 1 3 which could be a 

zinc-protease. 

[ 1]Jongene I C.V., Bouvier J., Bairoch A. t 

FEBS Lett. 242:211-214(1989). 

[ 2]Murphy G.J. P., Murphy G. ( Reynolds J.J. 
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FEBS Loll. 289:4-7(1991). 

| 3)Bode W. t Grams R, Roinomor P., Gomis-Rueth R-X., Baumann U. t McKay 

D.nunioaektti W, 

Zoology 99:237-246(1 996). 
5 [ 4]Rawlings N.D., Barrett AJ. 

. Moth. Enzymol. 248: 1 83-228(1 995). 

( 5]Woessner J. Jr. 

FASEB J. 5:2145-2154(1991). 

[ 6]Hito L.A., Fox J.W., Bjarnason J.B. 
10 [ 7]Montecucco C, Schiavo G. 

Trends Biochem. Sci. 18:324-327(1993). 

[ 8]Niemann H., Blasi J., Jahn R. 

Trends Cell Biol. 4:179-185(1994). 

is [1737] 751. PsoudoU_synt_1 

tRNA pseudouridine synthase is involved in the formation of pseudouridine at the anticodon stem and loop of transfer- 
RNAs Pseudouridine is-an isomer of uridine (5-(beta-D-riboluranosyl) uracil, and id the most abundant modified nucl- 
eoside found in all cellular RNAs. The TruA-like proteins also exhibit a conserved sequence with a strictly conserved 
aspartic acid, likely involved in catalysis. Number of members: 25 

20 [1738] [1]Medline: 98254513. Transfer RNA-pseudouridine synthetase Pus1 of Saccaromyces cerevisiae contains 
one atom of zinc essential lor its native conformation and tRNA recognition. Arluison V, Hountondji C, Robert B, Gros- 
jean H; Biochemistry 1998;37:7268-7276. 
[1739] 752. EPSP synthase signatures 

EPSP synthase (3-phosphoshikimate 1-carboxyvinyllransferase) (EC 2.5.1.19) catalyzes the sixth step in the biosyn- 
25 thesis from chorismate of the aromatic amino acids (the shikimate pathway) in bacteria (gene aroA), plants and fungi 

(where it is part of a multifunctional enzyme which catalyzes five consecutive steps in this pathway) [1 ]. EPSP synthase 

has been extensively studied as it is the target of the potent herbicide glyphosate which inhibits tho enzyme. 

[1740] The sequence of EPSP from various biological sources shows that the structure of the enzyme has been well 

conserved throughout evolution. Two conserved regions were selected as signature patterns. The first pattern corre- 
30 sponds to a region that is part of the active site and which is also important for the resistance to glyphosate [2]. The 

second pattern is located in the C-terminal part of the protein and contains a conserved lysine which seems to be 

important for tho activity of tho enzyme. 

[1741] Description of pattern(s) and/or profile(s) 

[1742] Consensus pattern[LIVMJ-x(2)-[GN]-N-[SA]-G-T-[STA]-x-R-x-[LIVMY]-x-[GSTA] 
35 consensus paUornlKR]-xMKH]^ 

[ 1]Stallings W.C., Abdel-Megid S.S., Lim L.W., Shieh H.-S. ( Dayringer H.E., Leimgruber N.K., Stegeman R.A., 
Anderson K.S., Sikorski J. A., Padgette S.R., Kishore G.M. Proc. Natl. Acad. Sci. U.S.A. 88:5046-5050(1991). 
[ 2]Padgette S.R., Re D.B., Gaser C.S., Eicholtz D.A., Frazier R.B., Hironaka CM., Levine E.B., Shah D.M., Fraley 
40 R.T., Kishore G.M. J. Biol. Chem. 266:22364-22369(1991 ). 

[1743] 753. Glyco_hydro_18 

Glycosyl hydrolases family 18. Number of members: 173 

[1]Medline: 95219379. Crystal structure of a bacterial chitinase at 2.3 A resolution. Perrakis A, Tews I, Dauter Z, Op- 
45 penheim AB, Chet I, Wilson KS, Vorgias CE; Structure 1994;2:1169-1180. 
[1744] 754. Esterase 
Putative esterase 

This family contains Esterase D Swiss:P10768. However it is not clear if all members of the family have the same , 
function. This family is possibly related to the COesterase family. 
50 Number of members: 36 

[1745] 755. (HMA) Heavy-metal-associated domain 

A conserved domain of about 30 amino acid residues has been found [1] in a number of proteins that transport or 
detoxify heavy metals. This domain contains two conserved cysteines that could be involved in the binding of these 
metals. The domain has been termed Heavy-Metal-Associated (HMA). It has been found in: 


55 


A variety of cation transport ATPases (E1 -E2 ATPases) (see <PDOC001 39>). The human copper ATPAses ATP7A 
and ATP7B which are respectively involved in Menke's and Wilson's diseases. ATP7A and ATP7B both contain 6 
tandem copies of the HMA domain. The copper ATPases CCC2 from budding yeast, copA from Enterococcus 
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faecatis and synA from Synechococcus contain one copy of the HMA domain. The cadmium ATPases cadA from 
Bacillus firmus and from plasmid pl258 from Staphylococcus aureus also contain a single HMA domain, while a 
chromosomal Staphylococcus aureus cadA contains two copies. Other, less characterized ATPases that contain 
the HMA domain are: fixl from Rhizobium meliloti, pacS from Synechococcus strain PCC 7942), Mycobacterium 
5 leprae ctpA and ctpB and Escherichia coli hypothetical protein yhhO. In all these ATPases the HMA domain(s) are 

located in the N-terminal section. 

Mercuric reductase (EC 1 . 1 6. 1 . 1 ) (gene merA) which is generally encoded by plasmids carried by mercury-resistant 
Gram-negative bacteria. Mercuric reductase is a class-1 pyridine nucleotide-disulphide oxidoreductase (see 
<PDOC00073>). There is generally one HMA domain (with the exception of a chromosomal merA from Bacillus 
strain RC607 which has two) in the N-terminal part of merA. 

Mercuric transport protein periplasmic component (gene merP), also encoded by plasmids carried by mercury- 
resistant Gram-negative bacteria. It seems to be a mercury scavenger that specifically binds to one Hg(2+) ion 
and which passes it to the mercuric reductase via the merT protein. The N-terminal half of merP is a HMA domain. 
Helicobacter pylori copper-binding protein copP. 
*s - Yeast protein ATX1 [2], which could act in the transport and/or partitioning of copper. 

[1746] The consensus pattern for HMA spans the complete domain. 
[1747] Description of pattern(s) and/or profiie(s) 

Consensus pattern[LIVN]-x(2)-[LIVMFA]-x-C-x-[STAGCDNH]-C-x(3)-[LIVFG)-x(3)-[LIV)-x(9,11)-[IVA]-x-[LVFYS] [The 
20 two C's probably bind metals] 

I 1]Bull PC, Cox D.W. Trends Genet. 10:246-252(1994). 

[ 2]Lin S.-J., Culotta V.L. Proc. Natl. Acad. Sci. U.S.A. 92:3784-3788(1995). 

2S [1748] 756. (Peptidase M10) Matrixins cystoino. switch 
PROSITE cross-reference(s): CYSTEINE_SWITCH 

Mammalian extracellular matrix metalloproteinases (EC 3.4.24.-), also known as matrixins [1] (see <PDOC00129>), 
are zinc-dependent enzymes. They are secreted by cells in an inactive form (zymogen) that differs from the mature 
enzyme by the presence of an N-terminal propeptide. A highly conserved octapeptide is found two residues downstream 
30 of the C-terminal end of the propeptide. This region has been shown to be involved in autoinhibition of matrixins [2,3]; 
a cysteine within the octapeptide chelates the active site zinc ion, thus inhibiting the enzyme. This region has been 
called the 'cysteine switch' or 'autoinhibitor region'. 
A cysteine switch has been found in the following zinc proteases: 

35 - MMP-1 (EC 3.4.24.7) (interstitial collagenase). 

- MMP-2 (EC 3.4.24.24) (72 Kd gelatinase). 
MMP-3 (EC 3.4.24. 1 7) (stromelysin-1 ). 

- MMP-7 (EC 3.4.24.23) (matrilysin). 

MMP-8 (EC 3.4.24.34) (neutrophil collagenase). 
40 - MMP-9.(EC 3.4.24.35) (92 Kd gelatinase). 
MMP-10 (EC 3.4.24.22) (stromelysin-2). 

- MMP-11 (EC 3.4.24.-) (stromelysin-3). 

MMP-1 2 (EC 3.4.24.65) (macrophage metalloelastase). 
MMP-1 3 (EC 3.4.24.-) (collagenase 3). 
45 . MMP-14 (EC 3.4.24.-) (membrane-type matrix metalliproteinase 1). 

MMP-15 (EC 3.4.24.-) (mombrano-type matrix motalliprotoinaso 2). 
MMP-16 (EC 3.4.24.-) (membrane-type matrix metalliproteinase 3). 
Sea urchin hatching enzyme (EC 3.4.24.12) (envelysin) [4]. 
Chlamydomonas reinhardtii gamete lytic enzyme (GLE) [5]. 

so 

[1749] Description of pattern(s) and/or profile(s) 

Consensus patternP-R-C-[GN]-x-P-[DR]-[LIVSAPKQ] [C chelates the zinc ion] 

[ 1]Woossnor J. Jr. FASEB J. 5:2145-2154(1901). 
ss I 2]Sanch z-Lopez R. ( Nicholson R. ( Gesnel M.C, Matrisian L.M., Brealhnach R. J. Biol. Chom. 263:11892-11899 

(1988). ! 
[ 3]Park A.J., Matrisian L.M., Kells A.F., Pearson R. ( Yuan 2., Navre M. J. Biol. Chem. 266:1584-1590(1991). 
[ 4]Lepage T, Gache C. EMBO J. 9:3003-3012(1990). 
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[ 2]Siezen R.J. (In) Proceeding subtilisin symposium, Hamburg, (1992). 
J 3]Barr P.J. Cell 66:1-3(1991). 

[ 4]Shaulsky G., Kuspa A., LoomisW.F.; Genes Dev. 9:1111-1122(1995). 
[ 5]Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:19-61(1994). 

5 ' 

[1752] 758. (SSB) Single-strand binding protein family signatures 
PROSITE cross-reference(s): PS00735; SSB_1 .PS00736; SSB_2 

The Escherichia coli single-strand binding protein [1] (gene ssb), also known as the helix-destabilizing protein, is a 
protein of 1 77 amino acids. It binds tightly, as a homotetramer, to single-stranded DN A (ss-DNA) and plays an important 

to role in DNA replication, recombination and repair. 

[1753] Closely related variants of SSB are encoded in the genome of a variety of large self-transmissible plasmids. 
SSB has also been characterized in bacteria such as Proteus mirabilis or Serratia marcescens. 
[1 754] Eukaryotic mitochondrial proteins that bind ss-DNA and are probably involved in mitochondrial DNA replication 
ar structurally and evolutionary related to prokaryotic SSB. Proteins currently known to belong to this subfamily are 

15 listed below |2]. 

Mammalian protein Mt-SSB (P16). 
Xenopus Mt-SSBs and Mt-SSBr. 
Drosophila MtSSB. 
20 - Yeast protein Rl M1 . 

[1755] Two signature patterns have been developed for these proteins. The first is a conserved region in the N- 
terminal section of the SSB's. The second is a centrally located region which, in Escherichia coli SSB, is known to be 
involved in the binding of DNA. 
25 [1756] Description of pattern(s) and/or prof ile(s) 

Consensus patternfLIVMF]-[NST]-[KRT]-[LIVM)-x-[LIVMF](2)-G-[NHRKHLIVM)-IGST]-x-[DET] 
Consensus patternT-x-W-[HY]-[RNS]-(LIVM]-x-[LIVMF]-[FY]-[NGKR] 

[ 1]Meyer R.R., Laine PS. Microbiol. Rev. 54:342-380(1990). 
30 [ 2]Stroumbakts N.D., Li Z. ( Tolias P.P. Gene 143:171-177(1994). 

[1757] 759. KDPG and KHG aldolases active site signatures 

PROSITE cross-reference(s): PS00159; ALDOLASE_KDPG_KHG_1 , PS00160; ALDOLASE_KDPG_KHG_2 
[1758] 4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) (KHG-aldolase) catalyzes the interconversion of 4-hydroxy- 
35 2-oxoglutarate into pyruvate and glyoxylate. Phospho-2-dehydro-3-deoxygluconate aldolase (EC 4.1.2.14) (KDPG- 
aldotase) catalyzes the interconversion of 6-phospho-2-dehydro-3-deoxy-D-gluconate into pyruvate and gtycoraldo- 
hyde 3-phosphate. 

[1759] These two enzymes are structurally and functionally related [1], They are both homotrimeric proteins of ap- 
proximately 220 amino-acid residues. They are class I aldolases whose catalytic mechanism involves the formation 
40 of a Schiff-base intermediate between the substrate and the epsilon-amino group of a lysine residue. In both enzymes, 
an arginine is required for catalytic activity. 

[1760] Two signature patterns wore dovolopod for those onzymos. Tho first ono contains tho activo silo arginino and 

the second, the lysine Involved In the Schlll-baso formation. 

[1761] Description of pattern(s) and/or profile(s) 
45 Consensus patternG-[LIVM]-x(3)-E-[LIV]-T-[LF]-R [R is the active site residue] Consensus patternG-x(3)-[LIVMF]-K- 

[LF]-F-P-[SA]-x(3)-G [K is involved in Schiff-base formation] 

[1762] [ 1] Vlahos C J., Dekker E.E. J. Biol. Chem. 263:11683-11691(1988). 

[1763] 760. AP endonucleases family 1 signatures. PROSITE cross-reference(s): PS00726; 

AP_NUCLEASE_F1_1, PS00727; AP_NUCLEASE_F1_2, PS00728; 
50 AP_NUCLEASE_F1_3 

[1764] DNA damaging agents such as the antitumor drugs bleomycin and neocarzinostatin or those that generate 
oxygen radicals produce a variety of lesions in DNA. Amongst these is base-loss which forms apurinic/apyrimidinic 
(AP) sites or strand breaks with atypical 3'termini. DNA repair at the AP sites is initiated by specific endonuclease 
cleavage of the phosphodiester backbone. Such endonucleases are also generally capable of removing blocking 
55 groups from the 3'terminus of DNA strand breaks. 

[1765] AP endonucleases can be classified into two families on the basis of sequence similarity. Family 1 groups 
the enzymes listed below [1 ]. 
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. Escherichia coli oxonucloase ill (EC 3.1.11,2) (gene xthA). 
- Streptococcus pneumoniae and Bacillus subtilis exonucleaso A (gene exoA). 
M/tmfinnlinn AP ondanuclonoo 1 (AP1) (EC 4,2,99.10). 
Drosophila recombination repair protein 1 (gene Rrp1). 
Arabidopsis thaliana apurinic endonuclease-redox protein (gene arp)-. 

[1766] Except tor Rrp1 and arp, these enzymes are proteins of about 300 amino-acid residues. Rrp1 and arp both 
contain additional and unrelated sequences in their N-terminal section (about 400 residues tor Rrp1 and 270 for arp). 
11767] Thtoo fil(jnntuio pwllornn worn dovolopocJ tor this family of onzymos. Tho pattorns are based on the most 
conserved regions. The first pattern contains a glutamate which has been shown [2], in the Escherichia coli enzyme 
to bind a divalent metal ion such as magnesium or manganese 

[1768] Consensus pattern! APF]-D-|LIVMF](2)-x-[LIVM]-Q-E-x-K [E binds a divalent metal ion] 
Consensus patternD-[ST]-[FY]-R-[KH]-x(7 ( 8)-[FYW]-[ST]-[FYW](2) 
Consensus pattornN-x-G-x-R-ILlVMl-D-[LIVMFYH]-x-|LV]-x-S 

[ 1] Barzilay G., Hickson I.S. BioEssays 17:713-719(1995). ■ 
[ 2] Mol C.D., Kuo C.-F., Thayer M.M., Cunningham R.P, Tainer J.A. Nature 374:381-386(1995). 

[1769] 761 (ER)Enhancer of rudimentary signature, PROSITE cross-reference(s): PS01290; ER 
20 [1770] The Drosophila protein 'enhancer of rudimentary 1 (gene (e(r» is a small protein of 104 residues whose function 

is not yet clear. From an evolutionary point ol vlow, it Ib highly conoorvod |1] find htis boon found to oxlsl In probably 

all multicellular eukaryotic organisms. It has been proposed that this protein plays a role in the cell cycle. 

[1771] A conserved region in the central part of the protein was selected as as signaure pattern. 

[1772] Consensus pattern Y-D-l-[SA)-x-L-[FY]-x-F-[IV]-D-x(3)-D-[LIV]-S 
25 [1773] [ 1] Gelsthorpe M., Pulumati M., McCaltum C Dang-Vu K., Tsubota S.L Gene 186:189-195(1997). 

[1774] 762. (ETF alpha) Electron transfer flavoprotein alpha-subunit signature, PROSITE cross-reference(s): 

PS00696; ETF_ALPHA . . 

[1775] The electron transfer flavoprotein (ETF) [1,2] serves as a specific electron acceptor for various mitochondrial 

dehydrogenases ETF transfers electrons to the main respiratory chain via ETF-ubiquinone oxidoreductase. ETF is an 
so heterodimer that consist of an alpha and a beta subunit and which bind one molecule of FAD per dimer. A similar 

system also exists in some bacteria. . 

[1776] The alpha subunit of ETF is a protein of about 32 Kd which is structurally related to the bacterial nitrogen 

fixation protein fixB which could play a role in a redox process and feed electrons to ferredoxin. 

[1777] Other related proteins are: 

35 

Escherichia coli hypothetical protein ydiR. 
Escherichia coli hypothetical protein ygcQ. 

[1778] A highly conserved region which is located in the C-terminal section was selected as a signature pattern for 

40 these proteins. , xnKt 
[1779] Consensus pattern [LI]-Y-[LlVM]-[AT]-x-G-[IV]-[SD]-G-x-[IV]-Q-H-x(2)-G-x(6)-[IV]-x-A-[IV]-N 

[ 1] Finocchiaro G., Ikeda Y, Ito M., Tanaka K. Prog. Clin. Biol. Res. 321:637-652(1990). 
[ 2] Tsai M.H., Saier M.H. Jr. Res. Microbiol. 146:397-404(1995). 


45 


so 


[1780] 763. (lectin c) C-type lectin domain signature and profile 

PKOSITE cross-relerence(s): PS00615; C_TYPE_LECTIN_1, PS50041; C_TYPE_LECTIN_2 

[1781] A number of different families of proteins share a conserved domain which was first characterized in some 
animal loctins and which soom to function as a calcium-dependent carbohydrate-recognition domain [1.2,3]. This do- 
main, which is known as the C-type lectin domain (CTL) or as the carbohydrate-recognition domain (CRD), consists 
of about 110 to 130 residues. There are four cysteines which are perlectly conserved and involved in two disulfide 
bonds. A schematic representation ol the CTL domain is shown below. 
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m Endothelial leukocyte adhesion molecule 1 (ELAM-1 , E-selectin or LECAM-2). 
11700] Tha Hound rocoflni7od b y ELAM-1 is eialyl-Lowie x, 

- Granule membrane protein 140 (GMP-140, P-selectin, PADGEM, CD62, or LECAM- 
3). The ligand recognized by GMP-140 is Lewis x. 

[1787] Large proteoglycans that contain a CTL-domain followed by one copy of a SCFV Sushi repeat, in their C- 

torminul section: 

- Aggrecan (cartilage-specific proteoglycan core protein). This proteoglycan is a major component of the extracellular 
matrix of cartilagenous tissues where it has a role in the resistance to compression. 
Brevican. 

- V^oicanTlargo fibroblast proteoglycan), a large chondroitin sulfate proteoglycan that may play a role in intercellular 
signalling. 

[17881 In addition to tho CTL and Sushi domains, these proteins also contain, in their N-terminal domain, an Ig-like 
V-type region, two or lour link domains (see <PDOC00955>) and up to two EGF-like repeats. 
so [1789] Two lype-l membrane proteins: 

- Mannose receptor from macrophages. This protein mediates the endocytosis of ... 
glycoproteins by macrophages in several recognition and uptake processes. Its extracellular section consists of a 
fibronectin type II domain followed by eight tandem repeats of the CTL domain. 

25 - 1 80 Kd secretory phospholipase A2 receptor (PLA2-R). A protein whose 
structure is highly similar to that of the mannose receptor. 

- DEC-205 receptor. This protein is used by dendritic cells and thymic epithelial cells to capture and endocytose 
diverse carbohydrate-binding antigens and direct them to antigen-processing cellular compartments DEC-205 
extracellular section consists of a fibronectin type II domain followed by ten tandem repeats of the CTL f domain 

30 - Silk moth hemocytin. an humoral lectin which is involved in a self-defence mechanism. It is composed of 2 FA5BC 

domains (see <PDOC009S8>), a CTL 

domain, 2 VWFC domains (see <PDOC00928), and a CTCK (see <PDOC00912>). 


35 


[1790] Various other proteins that uniquely consist of a CTL domain: 


- Invertebrate soluble galactose-binding lectins. A category to which belong a humoral lectin from a flesh fly; echi- 
noidin a lectin from the coelomic fluid of a sea urchin; BRA-2 and BRA-3. two lectins from the coelom.c fluid of a 
barnacle a lectin from the tunicate Polyandrocarpa misakiensis and a newt oviduct lectin. The phys.olog.cal im- 
portance of these lectins is not yet known but they may play an important role in defense mechanisms 

40 - Pancreatic stone protein (PSP) (also known as pancreatic thread protein (PTP). or reg), a protein that might act 
as an inhibitor of spontaneous calcium carbonate precipitation. 1<ora , irtr , 

- Pancreatitis associated protein (PAP), a protein that might be involved in the control of bacterial proliferation. 

- Tetranectin, a plasma protein that binds to plasminogen and to isolated kringle 4. 

- Eosinophil granule major basic protein (MBP), a cytotoxic protein. 

15 - A Galactose specific lectin from a rattlesnake. „ rntr > in 

- Two subunits of a coagulation factor IX/factor X-binding protein (IX/X-bp), a snake venom anticoagulant protein 
which binds with factors IX and X in the presence of calcium. 

- Two subunits of a phospholipase A2 inhibitor Irom the plasma of a snake (PLI-A and PLI-B). 

- A lipopolysaccharide-binding protein (LPS-BP) from the hemolymph of a 
so cockroach [8]. 

Sea raven antifreeze protein (AFP) [9]. 

[1791] As a signature pattern lor this domain, the C-terminal region with its three conserved ^ejnes was selected^ 
[l792] Consensus P atternC-[LIVMFYATG]-x(5.12)-[WL]-x-IDNSR]-x(2)-C-x(5.6)-[FYWL.VSTAHLIVMSTA]-C [The 

55 three C'sar involved in disulfide bonds] u 

Note all CTL domains have five Trp residues before the second Cys, with the exception of tunicate lect.n and cockroach 

LPS-BP which have Leu. a , t . Artf , rtil(fl 

Note this documentation entry is linked to both a signature pattern and a profile. As the profile is much more sens.trve 
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than the pattern, you should use it if you have access to the necessary software tools to do so. 

[ 1] Drickamer K. J. Biol. Chem. 263:9557-9560(1988). 

[ 2] Drickamer K. Prog. Nucleic Acid Res, Mol. Biol. 45:207-232(1993). 
5 [ 3] Drickamer K. Curr. Opin. Struct. Biol. 3:393-400(1993). . . , • 

[ 4] Spiess M. Biochemistry 29:10009-10018(1990). ' ' 

[ 5] Weis W.L, Kahn R, Fourme R., Drickamer K., Hendrickson W.A. Science 254:1608-1615(1991). 

[ 6] Siegelman M. Curr. Biol. 1:125-128(1991). 

[ 7] Lasky LA. Science 238:964-969(1992). 
10 [ 8] Jomori I, Natori S. J. Biol. Chem. 266:13318-13323(1991). 

[ 9] Ng N.F.L, Hew C.-L. J. Biol. Chem. 267:16069-16075(1992). 

[1793] 764. (SRCR) Speract receptor repeated domain signature 

PROSITE cross-reference(s): PS00420; SPERACT_RECEPTOR, , 
*s [1794] The receptor for the sea urchin egg peptide speract is a transmembrane glycoprotein of 500 amino acid 

residues [1]. Structurally it consists of a large extracellular domain of 450 residues, followed by a transmembrane 

region and a small cytoplasmic domain of 12 amino acids. The extracellular domain contains four repeats of a 115 

amino acids domain. There are 17 positions that are perfectly conserved in the four repeats, among them are six 

cysteines, six glycines, and three glutamates. 
20 [1795] Such a domain is also found, once, in the C-terminal section of mammalian macrophage scavenger receptor 

type I [2], a membrane glycoproteins implicated in the pathologic deposition of cholesterol in arterial walls during athero- 

genesis. 

[1796] The signature pattern that was derived spans part of the N-terminal section of the domain and contains 8 of 
the 17 conserved residues. 
25 [1797] Consensus patternG-x(5)-G-x(2)-E-x(6)-W-G-x(2)-C-x(3)-[FYW]-x(8)-C-x(3)-G 

[ 1] Dangott J.J., Jordan J. E., Bellet R.A., Garbers D.L Proc. Natl. Acad. Sci. U.S.A. 86:2128-2132(1989). 

[ 2] Freeman M., Ashkenas J., Rees D.J., Kingsley D.M., Copeland N.G M Jenkins N.A., Krieger M. Proc. Natl. 

Acad. Sci. U.S.A. 87:8810-8814(1990). 

30 

[1798] 765. Bac_surface_Ag 
Bacterial surface antigen 

This entry includes the following surface antigens; D15 antigen from H.influenzae, OMA87 from P.multocida, OMP85 
from N.meningitidis and N.gonorrhoeae. Number of members: 14 

35 

[1]Medline: 95255676. The sequencing of the 80-kDa D1 5 protective surface antigen of Haemophilus influenzae. 
Flack FS, Loosmore S, Chong P, Thomas WR; Gene 1995;156:97-99. 

[2] Medline: 96333354. Cloning, sequencing, expression, and protective capacity of theoma87 gene encoding the 
Pasteurella multocida 87-kilodalton outer membrane antigen. Ruffolo CG, Adler B; Infect Immun 1996;64: 
40 3161-3167. 

[1799] 766. BRCA1 C Terminus (BRCT) domain 

The BRCT domain is found predominantly in protoins involvod In coll cyclo chockpoint functions roeponoivo to DNA 
damage. It has been suggested that the Retinoblastoma protein contains a divergent BRCT domain, this has not been 
45 included in this family. The BRCT domain of XRCC1 forms a homodimer in the crystal structure Medline:99016060. 
This suggests that pairs of BRCT domains 
associate as homo- or heterodimers. Number of members: 131 

[1] Medline: 96259550. BRCA1 protein products ...Functional motifs... Koonin EV, Altschul SF, Bork P; Nature 
so Genet 1996;13:266-268. 

[2] Medline: 97153217. From BRCA1 to RAP1 : A widespread BRCT module closely associated with DNA repair 
Callebaut I, Mornon JP; Febs lett 1997;400:25-30. 

(3] Medline: 97186552. A superfamily of conserved domains in DNA damage responsive cell cycle checkpoint 
protoins Bork P, Hotmann K, Buchor P, Nouwfild AF, Altochul SF, Koonin EV; Ffioob J 1997; 11 :G0-76. 
5S [4] Medline: 97402527. Gapped BLAST and PSI -BLAST: a new generation of protein database search programs. 

Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ; Nucleic Acids Res 1997j25: 
3389-3402. 

[5] Medline: 99016060. Structure of an XRCC1 BRCT domain: a new protein -protein interaction module. Zhang 
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- fEfschorichia coli lipoprotein nlpC. 
r 'Escherichia coli lipoprotoin nlpD. 

EochOrlchltt colt oomolically Inducible lipoprotoin B (gone osmB). 
Escherichia coli osmotically inducible lipoprotein E (gene osmE). 
5 - Escherichia coli peptidoglycan-associated lipoprotein (gene pal). 

- . Escherichia coli rare lipoproteins A and B (genes rplA and rplB). 

Escherichia coli copper homeostasis protein cutF (or nlpE). 
Escherichia colt plasmids traT proteins. 
Escherichia coli Col plasmids lysis proteins. 
w - A number ot Bacillus beta-lactamases. 

Bacillus subtilis periplasmic oligopeptide-binding protein (gene oppA). 
Borrolia burgdorferi outor surface proteins A and B (genes ospA and ospB). 

- Borrelia hermsii variable major protein 21 (gene vmp21 ) and 7 (gene vmp7). 
Chlamydia trachomatis outer membrane protein 3 (gene omp3). 

tB - Fibrobactor succinogonos ondoglucanaso col-3. 

Haemophilus influenzae proteins Pal and Pep. 

Klebsiella pullulunase (gene pulA). 

Klebsiella pullulunase secretion protein puis. 

Mycoplasma hyorhinis protein p37. 
po - Mycoplfiamn hyorhinis variant surface antigens A, B, and C (gonos vIpABC). 

NoiGcorla outor mombrano protoln H.8. 

Pseudomonas aeruginosa lipopeptide (gene IppL). 

Pseudomonas solanacearum endoglucanase eg!. 

Rhodopseudomonas viridis reaction center cytochrome subunit (gene cytC). 
25 - Rickettsia 17 Kd antigen. 

Shifjolta tloxnorl Invasion plasmid proteins mxlJ and mxiM. 

Streptococcus pneumoniae oligopeptide transport protein A (gene amiA). 

Treponema pallidium 34 Kd antigen. 

Treponema pallidium membrane protein A (gene tmpA). 
30 - Vibrio harveyi chitobiase (gene chb). 

Yersinia virulence plasmid protein yscJ. 

- Halocyanin from Natrobacterium pharaonis [4], a membrane associated copper- binding protein. This is the first 
archaebacterial protein known to be modified in such a fashion). 


35 


[1818] From the precursor sequences of all these proteins, we derived a consensus pattern and a set of rules to 
identify this type of post-translationai modification. 

11819] Consensus pattern(DERK}(6).[LIVMFWSTAG](2)-[LIVMFYSTAGCQ]-[AGS]-C [C is the lipid attachment site) 
Additional rules: 1 ) The cysteine must be between positions 1 5 and 35 of the sequence in consideration. 2) There must 
40 be at least one Lys or one Arg in the first seven positions of the sequence. 

[ 1] Hayashi S., Wu H.C. J. Bioenerg. Biomembr. 22:451-471(1990). 
[ 2]Klein P., Somorjai R.L., Lau P.C.K. Protein Eng. 2:15-20(1988). 
[ 3]von Heijne G. Protein Eng. 2:531-534(1989). 
45 [ 4]Mattar S.. Scharf B., Kent S.B.H., Rodewald K. f Oesterhelt D., Engelhard M. J. Biol. Chem. 269:14939-14945 

(1994). 

[1820] 774 Aminoacyl-transfer RNA synthetases class-ll signatures . 
PROSITEcross-reference(s); AA_TRNA_LIGASE_II_1 ; AA_TRNA_LIGASEJl_2 Aminoacyl-tRNA synthetases (EC 

50 6 1 1 -) |1] are a group of enzymes which activate amino acids and transfer them to specific tRNA molecules as the 
first step in protein biosynthesis. In prokaryotic organisms there are at least twenty different types of aminoacyl-tRNA 
synthetases one tor each different amino acid. In eukaryoles there are generally two aminoacyl-tRNA synthetases for 
e^ch different amino acid: one cytosolic form and a mitochondrial form. While all these enzymes have a common 
function, they are widely diverse in terms of subunit size and of quaternary structure. 

55 [1821] The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, pro- 
line, serine, and threonine are referred to as class-ll synthetases [2 to 6] and probably hav a common folding pattern 
in their catalytic domain for the binding of ATP and amino acid which is different to the Rossmann fold observed for 
the class I synthetases [7]. 
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l1834] 776 . Urease signatures 
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M0361 Urease (EC 3.5,1 .5) is 0 nickel-binding enzyme that catalyzes the hydrolysis of urea to carbon dioxide and 
ammonia (1]. Historically, it was the first enzyme to be crystallized (in 1926). It is mainly found in P^s^r^- 
t.n i jMi iitit i id miU mvtuMjinl**. in plant*, iimnee In « hcixnmnr of identical chains'. In bacteria |2], it consists of either two 
or three dillerent subunits (alpha, beta and gamma). 
5 [1 836] Urease binds two nickel ions per subunit; four histidine, an aspartate and a carbamated-lysme serve as l.gands 
to4hese metals; an additional histidine is involved in the catalytic mechanism [3]. 

[1837] As signatures for this enzyme, a region was selected that contains two histidine that bind one of the nickel 
ionn nnd tho lotion of the active site histidine. . 

.18381 Conuonmm pnllom T-|AYMGA|-I«ATj-|LIVMl.D.x-H-|LIVMJ-H-x(3)-P (The two H s bind nickel] 
io Consensus pattern [LIVM](2)-[CT].H-[HN]-L-x(3)-[LIVM]-x(2)-D-[LIVM]-x-F-A (H is the active site residue] 

[ 1]Takishima K., SugaT., Mamiya G. Eur. J. Biochem. 175:151-165(1988). 

[ 2] Mobley H LT., Husinger R.P Microbiol. Rev. 53:85-108(1989). 

| 3] Jabri E.. Carr M.B., Hausinger R.P, Karplus PA. Science 268:998-1004(1995). 


16 

118391 779 Tyrosine specific protein phosphatases signature and profiles 

1840 Tyrosine specific protein phosphatases (EC 3.1.3.48) (PTPase) [1 to 5] are enzymes that catalyze the removal 
of a phosphate group attached to a tyrosine residue. These enzymes are very important in the control of cell growth, 
proliferation, differentiation and transformation. Multiple forms of PTPase have been characterized and can be classi- 
20 nod into two categories: soluble PTPases and transmembrane receptor proteins that contain PTPase doma.n(s). The 
currently known PTPases are listod below: 
[1841] Soluble PTPases. 

- PTPN1 (PTP-1B). 

25 PTPN3 (HiTandTS that contain an N-terminal band 4.1 - like domain (see <PDOC00566>) 

and could act at junctions between the membrane and cyloskeleton. 

- PTPN6 (PTP^C- HCP" SHP) and PTPN11 (PTP-2C; SH-PTP3; Syp), enzymes which contain two copies of the 
so SH2 domain at its N-terminal extremity. The Drosophila protein corkscrew (gene csw) also belongs to this subgroup. 

P7PN7 (LC-PTP; Hematopoietic protein -tyrosine phosphatase; HePTP). 
PTPN8 (70Z-PEP). 

- PTPN9 (MEG2), 

. PTPN12 (PTP-G1; PTP-P19). 
35 - Yeast PTP1. 

- Yeast PTP2 which may be involved in the ubiquitin-mediated protein degradation pathway- 
Fission yeast pyp1 and pyp2 which play a role in inhibiting the onset ol mitosis. 

Fission yeast pyp3 which contributes to the dephosphorylation ol cdc2. 
Yeast CDC14 which may be involved in chromosome segregation. 
40 - Yersinia virulence plasmid PTPAses (gene yopH). 

Autographa californica nuclear polyhedrosis virus 19 Kd PTPase. 

[1842] Dual spocificity PTPases. 
45 - DUSP1 (PTPN10; MAP kinase phosphatase-1; MKP-1); which dephosphorylates MAP kinase on both Thr-183 

- DU d SP2 (PAC-1), a nuclear enzyme that dephosphorylates MAP kinases ERK1 and ERK2 on both Thr and Tyr 
residues. 

- DUSP3(VHR). 
so - DUSP4 (HVH2). 

- DUSP5(HVH3). 

- DUSP6 (Pystl; MKP-3). 
DUSP7 (Pyst2; MKP-X). 

- ' Yeast MSG5, a PTPase that dephosphorylates MAP kinase FUS3. 
55 . Yeast YVH1. 

Vaccinia virus H1 PTPase; a dual specificity phosphatase. 

[1843] Receptor PTPases. 
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[1844] Structurally, all known receptor PTPases, are made up of a variable length extracellular domain, followed by 
a transmembrane region and a C-terminal catalytic cytoplasmic domain. Some of the receptor PTPases contain fi- 
bronectin type III (FN-lll) repeats, immunoglobu!in-like domains, MAM domains or carbonic anhydrase-liko domains 
in their oxlracollular rogion. The cytoplasmic region generally contains two copies of tho PTPAso domain. Tho first 
s ems to have enzymatic activity, while the second is inactive but seems to affect substrate specificity of the first. In , 
these domains, the catalytic cysteine is generally conserved but some other, presumably important, residues are not. 
[1845] In the following table, the domain structure of known receptor PTPases is shown: 


Extracellular ! 

Intracellular 


Ig FN-3 CAH MAM PTPase 

Leukocyte common antigen (LCA) (CD45) 

0 

2 

0 

0 

2 

Leukocyte antigen related (LAR) 

3 

8 

0 

0 

2 

Drosophila DLAR 

3 

9 

0 

0 

2 

Drosophila DPTP 

2 

2 

0 

0 

2 

PTP : alpha (LRP) 

0 

0 

0 

0 

2 

PTP-beta 

0 

16 

0 

0 

1 

PTP-gamma 

0 

1 

1 

0 

2 

PTP-delta 

0 

>7 

0 

0 

2 

PTP-epsilon 

0 

0 

0 

0 

2 

PTP-kappa 

1 

4 

0 

1 

2 

PTP-mu 

1 

4 

0 

1 

2 

PTPrzeta 

0 

1 

1 

0 

2 


[1846] PTPase domains consist of about 300 amino acids. There are two conserved cysteines, the second one has 
been shown to be absolutely required for activity. Furthermore, a number of conserved residues in its immediate vicinity 
have also been shown to be important. 

[1847] A signature pattern was derived for PTPase domains centered on the active site cysteine. 

[1848] There are three profiles for PTPases, the first one spans the complete domain and is not specific to any 

subtype. The second profile is specific to dual-spociticity PTPasos and tho third one to tho PTP subfamily, 

[1849] Consensus pattern [LIVMF]-H-C-x(2)-G-x(3)-[STC]-ISTAGP]-x-[LIVMFY] [C is the active site residue] 

[1850] Notethe M-phase inducer phosphatases (cdc25-type phosphatase) are tyrosine-protein phosphatases that 

are not structurally related to the above PTPases. 

[1851] Notethis documentation entry is linked to both a signature pattern and to profiles. As profiles are much more 
sensitive than the pattern, you should use them if you have access to the necessary software tools to do so. 

[ 1] Fischer E.H., Charbonneau H., Tonks N.K. Science 253:401-406(1991). 
[ 2] Charbonneau H., Tonks N.K. Annu. Rev. Cell Biol. 8:463-493(1992). 
[ 3] Trowbridge I.S. J. Biol. Chem. 266:23517-23520(1991). 
[ 4] Tonks N.K., Charbonneau H. Trends Biochem. Sci. 14:497-500(1989). 
[ 5] Hunter T. Cell 58:1013-1016(1989). 

[1852] 780. Connexins signatures 

[1853] Gap junctions [1] are specialized regions of the plasma membrane which consist of closely packed pairs of 
transmembrane channels, the connexons, through which small molecules diffuse from a cell to a neighboring cell. Each 
connexon is composed of an hexamer of an integral membrane protein which is often referred to as connexin. In a 
given species there are a number of different, yet structurally related, tissue specific, forms of connexins. The types 
of connexins which are currently known are listed below. 

Connexin 56 (Cx56). 

Connexin 50 (Cx50) (lens fiber protein MP70). 
Connexin 46 (Cx46) (alpha-3). 
Connexin 45 (Cx45) (alpha-6). 
Connexin 43 (Cx43) (alpha-1). 

Connexin 40 (Cx40) (alpha-5). 1 
Connexin 38 (Cx38) (alpha-2). 
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[16741 7M, Cglluloso-bincJinfl domain, bacterial typo 

|1876] Tho microbial degradation ol cellulose and xylans requires several typos ol enzyme such as ondoglucanasos 

(l C :t v.. M), littlluliluliyi-liidlhswci (PC n,P,1,P1) (pxagkmnnnpm*). or xylnnnpoe (HC 3,2,1,8) 

[1876] Structurally, cellulases and xylanases generally consist ol a catalytic domain joinod to a cellulose-binding 
s domain (CBD) by a short linker sequence rich in proline and/or hydroxy-amino acids. 

[1877] The CBD of a number of bacterial cellulases has been shown to consist of about 105 ammo acid residues 
\2], Enzymes known to contain such a domain are: 

Endogiucanaso (gono ondl ) Irom Bulyrivibrio librisolvons. 

w - Endoglucanases A (gene cenA) and B (cenB) from Cellulomonas fimi. 

- Exoglucanases A (gone cbhA) and B (cbhB) from Cellulomonas fimi. 

- Endoglucanaso E-2 (gono colB) from Thermomonospora fusca. 

- Endoglucanase A (gene celA) from Microbispora bispora. 

- Endoglucanases A (gene celA), B (celB) and C (ceIC) from Pseudomonas fluorescens. 
is - Endoglucanase A (gone celA) from Streptomyces lividans. 

- Exocellobiohydrolase (gene cex) from Cellulomonas fimi. 

- Xylanasos A (gone xynA) and B (xynB) from Pseudomonas fluorescens. 

. ArablnoluranosldaGO C (EC 3.2. 1 .55) (xylonaoo C) (gono xynC) Irom Pseudomonas fluoroscons. 

- Chitinase 63 (EC 3.2.1.14) from Streptomyces plicatus. 
20 - Chitinase C from Streptomyces lividans. 

[18781 The CBD domain is found either at the N-terminal or at the C-terminal extremity of these enzymes. As it is 
shown in the following schematic representation, there are two conserved cysteines in this CBD domain - one at each 
extremity of the domain - which have been shown [3] to be involved in a disulfide bond. There are also four conserved 
2S tryptophan residues which could be involved in the interaction of the CBD with polysaccharides. 


30 


i i 

xCxxxxWxxxxxNxxxWxxxxxxxWxxxxxxxxWNxxxxxGxxxxxxxxxxCx 


35 


40 


45 


'C: conserved cysteine involved in a disulfide bond. '*': position of the pattern. 


Consensus patternW-N-[STAGR]-[STDN]-[LIVM]-x(2)-[GST]-x-|GST]-x(2)- [LIVMFT]-[GA] 

[1] Gilkes N.R.. Henrissat B., Kilburn D.G., Miller R.C. Jr., Warren R.A.J. Microbiol. Rev. 55 ™ 3 ?^^ 

21 Meinke A Gilkes N.R., Kilburn D.G.. Miller R.C. Jr., Warren R.A.J. Protein Seq. Data Anal. 4:349-353(1991). 
[ 3] Gilkes N.R., Claeyssens M., Aebersold R., Henrissat B., Meinke A., Morrison H.D., Kilburn D.G., Warren R.A. 
J., Millor R.C. Jr. Eur. J. Diochom. 202:367-377(1991). 

M8791 785. Amidases signature . .. _ m 

1880] It has been shown [1 ,2,3] that several enzymes from various prokaryotic and eukaryotic organisms wh.cn are 
involved in the hydrolysis of amides (amidases) are evolutionary related. These enzymes are listed below. 

so . mdoleacetamide hydrolase (EC 3.5.1 .-). a bacterial plasmid-encoded enzyme that catalyzes the hydrolysis of in- 
dole-3-acetamide (I AM) into indole-3-acetate (IAA), the second step in the biosynthesis of auxins from tryptophan. 

- Acetamidase from Emericella nidulans (gene.amdS), an enzyme which allows acetamide to be used ds a sole 
carbon or nitrogen source. ,.„»,« 

- Amidase (EC 35 1 4) from Rhodococcus sp. N-774 and Brevibacterium sp. R312 (gene amdA). This enzyme 
55 hydrolyzes propionamides efficiently, and also at a lower efficiency, acetamide. acrylamide and mdoleacetamide. 

Amidase (EC 3.5.1.4) from Pseudomonas chlororaphis. 
. 6-aminohexanoate-cyclic-dimer hydrolase (EC 3.5.2.12) (nylon oligomers degrading enzyme E1) (gene nylA) a 
bacterial plasmid encoded enzyme which catalyzes the first step in the degradation of 6-am.nohexano.c acid cyclic 
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A pulative amidase Horn ^' 'g^g amidases amiA2, amiB2, amiC and amiD. 
Consensus pattern: G-lGA]-S-l^j i ) 


,0 [LIVMJ-R-X-P-lGSACl 172:6 764-6773(1990). 

S^^jSL S P Bogef 0,. ^ - — — 
1 61 C.avatl B.F.. Giang D.K-. MayliaH 

. Aspergillus awamori xylanase A jxynA). 

Bacillus sp. strain 1 25 xylanase (xynA). 
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[ 4] Honrissat 8. Biochem, J. 260:309-316(1991). 

( 5] Tull D.. Wilhors S.G., Gilkes N.R., Kilburn D.G., Warren R.A.J., Aeborsold R. J. Biol. Chem. 266:15621-15625 
<ion|), 

s [1886]. 787. Fructose-bisphosphate aldolase class-ll signatures • f • 

[1887] Fructose-bisphosphate aldolase (EC 4.1.2.13) [1 ,2] is a glycolytic enzyme that catalyzes the reversible aldol 
cleavage or condensation of fructose-1 ,6- bisphosphate into dihydroxyacetone-phosphate and glyceraldehyde 3-phos- 
phato. Thero are two classes ol fructose-bisphosphate aldolases with different catalytic mechanisms. Class-ll aldolases 
mninly lounfi in pinknrynlnn wnrl lungi, «ro hornorlimoric on/ymos which require a divalent metal ion - generally 

to zinc - lor thoir activity. 

[1888] This lamily also includes the following proteins: 

- Escherichia coli galactitol operon protein gatY which catalyzes the transformation of tagatose 1 ,6-bisphosphate 
into glycerone phosphate and D- glyceraldehyde 3-phosphate. 

it* - l^fichorlchiM coll N-ncnlyl fjntncloonmlno oporon proloin ngnY which cnlnlyzos tho samo ronction as that of gatY 

[1889] As signature patterns for this class of enzyme, two conserved regions were selected. The first pattern is 
located in the first halt of the sequence and contains two histidine residues that have been shown [4] to be involved in 
binding a zinc ion The second is located in the C-terminal section and contains clustered acidic residues and glycines. 
po [1890] Consensus P fmorn|FYVMT]-x(1 ,3HLIVMH]-|APN]-[LIVM]-x(1 ,2)-|LIVM]-H-x-D-H-[GACH] [The two H's are 

zinc ligands] 

Consensus patlern[LIVM]-E-x-E-[LIVM]-G*x(2)-[GM]-[GSTA]-x-E 

[ 1] Perham R.N. Biochem. Soc. Trans. 18:185-187(1990). 
25 [ 2] Marsh J.J., Lebherz KG. Trends Biochem. ScL 17:110-113(1992). 

[ 3] von dor Osten C.H., Barbas C.F. Ill, Wong C.-H., Sinskey A.J. Mol. Microbiol. 3:1625-1637(1989). 
I 4] Berry A. ( Marshall K.E. FEBS Lett. 318:11-16(1993). 

[1891] 788. Prolyl oligopeptidase family serine active site 
30 [1892] The prolyl oligopeptidase family [1 ,2,3] consist of a number of evolutionary related peptidases whose catalytic 
activity seems to be provided by a charge relay system similar to that of the trypsin family of serine proteases, but 
which ovolvod by independent convergent evolution. The known members of this family are listed below. 

- Prolyl endopeptidase (EC 3.4.21 .26) (PE) (also called post-proline cleaving enzyme). PE is an enzyme that cleaves 
35 peptide bonds on the C-terminal side of prolyl residues. The sequence of PE has been obtained from a mammalian 

species (pig) and from bacteria (Flavobacterium meningosepticum and Aeromonas hydrophila); there is a high 
degree of sequence conservation between these sequences. 

- Escherichia coli protease II (EC 3.4.21 .83) (oligopoptidaso B) (gone prtB) which cleaves peptide bonds on the C- 
terminal side of lysyl and argininyl residues. 

40 - Dipeptidyl peptidase IV (EC 3.4.1 4.5) (DPP IV). DPP IV is an enzyme that removes N-ierminal dipeptides sequen- 
tially Irom polypeptides having unsubstituted N-lermini provided that the penultimate residue is proline. 

- Yeast vacuolar dipeptidyl aminopeptidase A (DPAP A) (gene: STE 1 3) which is responsible for the proteolytic mat- 
uration of the alpha-lactor precursor. 

Yeast vacuolar dipeptidyl aminopeptidase B (DPAP B) (gene: DAP2). 
45 - Acylamino-acid-releasing enzyme (EC 3.4. 1 9. 1 ) (acyl-peptide hydrolase). This enzyme catalyzes the hydrolysis 
of the amino-terminal peptide bond of an N-acetylated protein to generate a N-acetylated amino acid and a protein 
with a free amino-terminus. 

[1 893] A conserved serine rosiduo has experimentally been shown (in E.coli protease II as well as in pig and bacterial 
50 PE) to be necessary for the catalytic mechanism. This serine, which is part of the catalytic triad (Ser, His, Asp), is 
generally located about 150 residues away from the C-terminal extremity of these enzymes (which are all proteins that 
contains about 700 to 800 amino acids). 

[1-894] Consensus patternD-x(3)-A-x(3)-[LIVMFYW]-x(14)-G-x-S-x-G-G-[LIVMFYW](2) [S is the active site residue] 
[1895] Note these proteins belong to families S9A/S9B/S9C in the classification of peptidases [4,E1). 


55 


[ 1] Rawlings N.D., Polgar L., Barrett A.J. Biochem. J. 279:907-911(1991). 
[ 2] Barrett A.J., Rawlings N.D. Biol. Chem. Hoppe-Seyler 373:353-360(1992). 
[ 3] Polgar L, Szabo E. Biol. Chem. Hoppe-Seyler 373:361-366(1992). 
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[ 4] Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:19-61(1994). 
[1896] 789. Format --tetrahydrofolale ligase signatures 

[1897] Formate-tetrahydrofolate ligase (EC 6.3.4.3) (formyltetrahydrofolate synthetase) (FTHFS) is one of the en- 
5 zyrnes participating in the transfer of one-carbon units, an essential element of various biosynthetic pathways. In many 
of these processes the transfers of one-carbon units are mediated by the coenzyme tetrahydrofolate (THF). Various 
reactions generate one-carbon derivatives of THF which can be interconverted between different oxidation slates by 
FTHFS, methylenetetrahydrofolate dehydrogenase (EC 1.5.1.5) and methenyttetrahydrotolate cyclohydrolase (EC 
3.5.4.9). 

io [1898] In eukaryotes the FTHFS activity is expressed by a multifunctional enzyme, C-1 -tetrahydrofolate synthase 
(CI -THF synthase), which also catalyzes the dehydrogenase and cyclohydrolase activities. Two forms of C1-THF 
synthases are known [1], one is located in the mitochondrial matrix, while the second one is cytoplasmic. In both forms 
the FTHFS domain consist of about 600 amino acid residues and is located in the C-terminal section of C1-THF syn- 
thase. In prokaryotes FTHFS activity is expressed by a monofunctional homotetrameric enzyme of about 560 amino 

is acid residues [2]. 

[1899] The sequence of FTHFS is highly conserved in all forms of the enzyme. As signature patterns, two regions 
that are almost perfectty conserved were selected. The first one is a glycine-rich segment located in the N-terminal 
part of FTHFS and which could be part of an ATP-binding domain [2]. The second pattern is located in the central 
section of FTHFS. 
20 [1900] Consensus patternG-[LIVM]-K-G-G-A-A-G-G-G-Y 
Consensus patternV-A-T-[IV]-R-A-L-K-x-[HN]-G-G 

[ 1] Shannon K.W., Rabinowitz J.C. J. Biol. Chem. 263:7717-7725(1988). 

[ 2] Lovell C.R., Przybyla A., Ljungdahl L.G. Biochemistry 29:5687-5694(1990). 

25 

[1901] 790. Transthyretin signatures 

[1902] Transthyretin (prealbumin) [1] is a thyroid hormone-binding protein that seems to transport thyroxine (T4) 
from the bloodstream to the brain. It is a protein of about 130 amino acids that assembles as a homotetramer and 
forms an internal channel that binds thyroxine. Transthyretin is mainly synthesized in the brain choroid plexus. In 
30 humans, variants of the protein are associated with distinct forms of amyloidosis. 

[1903] The sequence of transthyretin Is highly consorvod In vorlobratoa. A number of unclwuclorlzod prololno nloo 
belong to this family: 

Escherichia coli hypothetical protein yedX. 
35 - Bacillus subtilis hypothetical protein yunM. 

Caenorhabditis elegans hypothetical protein R09H10. 3. 
Caenorhabditis elegans hypothetical protein ZK697.8. 

[1904] Two regions were selected as signature patterns. The first located in the N-terminal extremity starts with a 
40 lysine known to be involved in binding T4. The second pattern is located in the C-terminal extremity. 
[1905] Consensus pattern[KH]-[IV]-L-[DN]-x(3)-G-x-P-A-x(2)-[IV]-x-[IV] [The K binds thyroxine] 
Consensus pattern Y-[TH)-[IV]-[AP]-x(2)-L-S-[PQ]-[FYWHGS]-[FYl-[QS] 
[1906] [ 1] Schreiber G., Richardson S.J. Comp. Biochom, Physiol. 11 6B:1 37-160(1 997). 
[1907] 791. Dihydropteroate synthase signatures 
45 [1908] All organisms require reduced folate cofactors for the synthesis of a variety of metabolites. Most microorgan- 
isms must synthesize folate de novo because they lack the active transport system of higher vortobrato colls which 
allows these organisms to use dietary folates. Enzymes that are involved in the biosynthesis of folates are therefore 
the target of a variety of antimicrobial agents such as trimethoprim or sulfonamides. 

[1909] Dihydropteroate synthase (EC 2.5.1.15) (DHPS) catalyzes the condensation of 6-hydroxymothyl-7 ( 8-dihy- 
so dropteridine pyrophosphate to para-aminobenzoic acid to form 7,8-dihydropteroate. This is the second step in the three 
steps pathway leading from 6-hydroxymethyl-7,8-dihydropterin to 7,8-dihydrofolate. DHPS is the target of sulfonamides 
which are substrates analog that compete with para-aminobenzoic acid. 

[1910] Bacterial DHPS (gene sul or folP) [1] is a protein of about 275 to 315 amino acid residues which is either 
chromosomally encoded or found on various antibiotic resistance plasmids. In the lower eukaryoto Pneumocystis car- 
ss inii, DHPS is the C-terminal domain of a multifunctional IoIhIo synthosls onzymo (gono Ihb) |2]. 

[1911] Two signature patterns for DHPS were developed, the first signature is located in the N-terminal section of 
these enzymes, while the second signature is located in the central section. 
[1 91 2] Consensus pattern(LI VM]-x-[AG]-[LI VMF](2)-N-x-T-x-D-S-F-x-D-x-[SG] 
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terminal extremity. The mammalian enzym differs 1rom the bacterial or yeast proteins by having an EF-hand calcium- 
binding region (See <PDOC00018>) in its C-terminal extremity. 

[1922] Two signature patterns were developed. One based on the first half of the FAD-binding domain and one which 
corresponds to a conserved region in the central part of thoso onzymes. 
5 [1923] Consensus pattern[IV]-G-G-G-x(2)-G-[STACV]-G-x-A-x : D-x(3)-R-G 
Consensus pattemG-G-KrX(2)-[GSTE]-Y-R-x(2)-A 

[ 1] Austin D., Larson T.J. J. Bacteriol. 173:101-107(1991). 
[ 2] Roennow B., Kielland-Brandt M.C. Yeast 9:1121-1130(1993). 
to [ 3] Brown L.J. , McDonald M.J., Lehn D.A., Moran S.M. J. Biol. Chem. 269:14363-14366(1994). 

[1924] 794. NOL1/NOP2/sun family signature 

[1925] The following proteins seems to be evolutionary related: 

*5 - Mammalian proliferating-cell nucleolar antigen p120 (gene NOLI) which may play a role in the regulation of the 
cell cycle and the increased nucleolar activity that is associated with the cell proliferation. 

Yeast nucleolar pro'tein NOP2 (or YNA1) which could be involved in nucleolar function during the onset of growth, 
and in the maintenance of nucleolar structure. 
Yeast hypothetical protein YBL024w. 
20 - Bacterial protein sun (also known as fmu). 

Escherichia coli hypothetical protein yebU. 

Mycobacterium tuberculosis hypothetical protein MtCY2l B4.24. 

Methanococcus jannaschii hypothetical protein MJ0026. 

25 NOLI is a protein of 855 residues, NOP2 consists of 618 residues, YBL024w of 684, sun is a protein of about 430 to 
450 residues and MJ026 has 274 residues. They share a conserved central domain which contains some highly con- 
served regions. One of these regions was selected as a signature pattern. 
[1926] Consensus pattern[FV]-D-[KRA]-ILIVMA]-L-x-D-[AV]-P-C.[ST)-[GA] 
[1 927] 795. moaA / nif B / pqqE family signature 

30 [1 928] A number of proteins involved in the biosynthesis of metallo cofactors have been shown [1 ,2] to be evolutionary 
related. These proteins are: 

Bacterial and archebacterial protein moaA, which is involved in the biosynthesis of the molybdenum cofactor (mo- 
lybdopterin; MPT). 

35 - Arabidopsis thaliana cnx2, a protein involved in molybdopterin biosynthesis and which is highlys similar to moaA. 
Bacillus subtilis narA, which seems to be the moaA ortholog in that bacteria. 

Bacterial protein nifB (or tixZ) which is involved in the biosynthesis of the nitrogenase iron-molybdenum cofactor. 
Bacterial protein pqqE which is involved in the biosynthesis of the cofactor pyrrolo-quinoline-quinone (PQQ). 
Pyrococcus furiosus cmo, a protein involved in the synthesis of a molybdopterin-based tungsten cofactor. 
40 - Caenorhabditis elegans hypothetical protein F49E2.1 . 

[1929] All these proteins share, in their N-terminal region, a conserved domain that contains three cysteines. In 
moaA, those cysteines havo boon shown [1] to bo Important for tho biological nctlvlly. Thoy could bo inolvod tn Iho 
binding of an iron-sulfur cluster. 
45 [1930] Consensus pattern[LIV]-x(3)-C-[NP]-[LIVMFHQRS]-C-x-[FYM]-C [The three C's are putative Fe-S ligands 

[ 1] Menendez C, Igloi G., Henninger H., Brandsch R. Arch. Microbiol. 164:142-151(1995). 
[ 2] Hoff T, Schnorr K.M., Meyer C, Caboche M. J. Biol. Chem. 270:6100-6107(1995). 

50 [1931] 796. Forkhead-associated (FHA) domain profile 

[1932] The forkhead-associated (FHA) domain [1.E1] is a putative nuclear signalling domain found in a variety of 
otherwise unrelated proteins. The FHA domain comprise approximately 55 to 75 amino acids and contains three highly 
conserved blocks separated by divergent spacer regions. Currently it has been found in the following proteins: 

55 - Four transcription factors that also contain a forkhead (FH) domain: mouse myocyte nuclear factor 1 (MNF1 ), yeast 
transcription factor FHL1 t which probably controls pre-mRNA processing, and yeast FKH1 and FKH2. In thbse 
protein the FHA domain is located N-terminal of the DNA-binding FH domain. 

Kinase-associat d protein phosphatase (KAPP) from Arabidopsis thaliana, a protein which specifically interacts 
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wilh tho rocoptor-typo Sor/Thr-kinase RLK5. In KAPP. the FHA domain maps to a region that interacts with the 
receptor-type protein kinase RLK5 only if the kinase is phosphorylated on serine residues [2]. 

- Two piololn klnnnon Irorn yonot thr.t nro involved in mediating tho nuclear response to DNA damage: DUN1 and 
SPK1/SAD1 |3]. The latter is tho only known protoin containing two copies of the FHA domain. 

. Protein kinase cdsl from fission yeast contains a FHA domain and might be the ortholog of SPK 1 . 

- . Protein kinase MFK1 from east, which is involved in meiotic recombination. 

- Human nuclear antigen Ki67 which is expressed only in proliferating cells. 

- Yoast hypothetical protein YHR115C, which contains a RING-finger C-terminal of the FHA domain, 

. Yoast hypothetical protoins LB083.1 find 9346.10, which contain an oxtonsive coiled-coil region C-terminal of the 
FHA domain. 

Caenorhabditis elegans hypothetical protein ZK632.2. 
Caonorhabditis elegans hypothetical protein C01G6.5. 

- FraH from the prokaryote Anabaena, which contains a zinc-finger motif N-terminal of the FHA domain. 

- An ORF from the bacterium Streptomyces, which is on the opposite strand of the protein kinase pksl , overlapping 
the ORF of the kinase. 

[ 1] Hofmann K.OvBucher P. Trends Biochem. Sci. 20:347-349(1995). 

[ 2] Stono J.M., Collingo MA, Smith R.D., Horn MA, Walker J.C. Science 266:793-795(1994). 
[ 3] Navas TA, Zhou Z. f Elledge S.J. Cell 80:29-39(1995). 

[1933] 797. Ald_Xan_dh_C 

Aldehyde oxidase and xanthine dehydrogenase, C terminus 

[1934] [1] Romao MJ, Archer M, Moura I, Moura J J, LeGall J, Engh R, Schneider M, Hof R Huber R; Medhne: 
96072968 -Crystal structure of the xanthine oxidase-rclated aldehyde oxido-reductase from D. gigas." Science 1995; 
270:1170-1176. 

Number of members: 54 

[1935] 798, Glyco„hydro_38 
Glycosyl hydrolases family 38 

[1 936] Glycosyl hydrolases are key enzymes of carbohydrate metabolism. 
Number of members: 20 

[1937] [1] Henrissat B; Medline: 98313424; Glycosidase families'' Biochem Soc Trans 1998;26:153-156. 

[1938] 799. HECT 

HECT-domain (ubiquitin-transferase). 

[1939] Tho name HECT comes from Homologous to the E6-AP Carboxyl Terminus. 
Number of members: 43 

[1940] [1] Huibregtse JM. Scheffner M, Beaudenon S. Howley PM; Medline: 95223981; A family of proteins struc- 
turally and functionally related to the E6-AP ubiquitin-protein ligase." Proc Natl Acad Sci U S A 1995;92:2563-2567. 
[1941] 800. HRDC 

HRDC domain , . 4 . ^ 

[1942] The HRDC (Helicase and RNase D C-terminal) domain has a putative role in nucleic acid binding. Mutations 

in the HRDC domain cause human disease. 
Numbor ol mombors: 19 

[1943] [1] Morozov V, Mushegian AR, Koonin EV, Bork P; Medline: 98060076; A putative nucleic acid-binding domain 
in Bloom's and Werner's syndrome helicases" Trends Biochem Sci 1997;22:417-418. 
[1944] 801. Integrase 

[1945] Integrase mediates integration of a DNA copy of the viral genome into the host chromosome. Integrase is 
composed of three domains. The amino-terminal domain is a zinc binding domain. The central domain is the catalytic 
domain [1].The carboxyl terminal domain is a DNA binding domain [2]. 
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Number of members: 581 

[1946] ' < 

[1] Dyda F, Hickman AB, Jenkins TM, Engelman A, Craigie R ( Davies DR; Medline: 95099322. Crystal structure 
of the catalytic domain of HIV-1 integrase: similarity to other polynucleotidyl transferases." Science 1994;266: 
1981-1986. 

[2] Lodi PJ, Ernst JA, Kuszewski J, Hickman AB, Engelman A, Craigie R, Clore GM, Gronenborn AM; Medline: 
95359147; Solution structure of the DNA binding domain of HIV-1 integrase." Biochemistry 1995;34:9826-9833 

[1947] 802. Iig_chan 
Ligand-gated ion channel 

[1948] This family includes the four transmembrane regions of the ionotropic glutamate receptors and NMDA recep- 
tors. 

Number of members: 128 

i 

[1949] [1] Tong G, Shepherd D, Jahr CE; Medline: 95184014; Synaptic desensitization of NMDA receptors by cal- 
cineurin." Science 1995;267:1510-1512. 
[1950] 803. RhoGAP 
RhoGAP domain 

[1951] GTPase activator proteins towards Rho/Rac/Cdc42-like small GTPases. 

Number of members: 97 

[1952] 

[1] Musacchio A, Cantley LC, Harrison SC; Medline: 97121392; Crystal structure of the breakpoint cluster region- 
homology domain from phosphotnositide 3-kinase p85 alpha subunit.* Proc Natl Acad Sci U S A 1996;93: 
14373-14378. 

[2] Barrett T, Xiao B, Dodson EJ, Dodson G, Ludbrook SB, Nurmahomod K, Gamblin SJ, Musacchio A, Smordon 
SJ, Eccleston JF; Medline: 97162209; The structure of the GTPase-activating domain from p50rhoGAP." Nature 
1997;385:458-461. 

[3] Rittinger K, Walker PA, Eccleston JF, Nurmahomed K, Owen D, Laue E, Gamblin SJ, Smerdon SJ; Medline: 
97404320; Crystal structure of a small G protein in complex with the GTPase-activating protein rhoGAR" Nature 
1997;388:693-697. 

[4] Bogu6kl MS, McCormick F; Modllno: 94081948; Protolns fogulntlng Rhs und lie rolatlvos," Nnluro 1f)f)3;066; 
643-654. 

[1953] 804. vwd 

von Willebrand factor type D domain 

[1954] (1] Bork P; Medline: 93327926; The modular architecture of a new family of growth regulators related to 
connective tissue growth factor.- FEBS lett 1993;327:125-130. 

Number of members: 92 

[1955] 805. zf-C4_Topoisom 
Topoisomerase DNA binding C4 zinc finger 

[1] Tse-Dinh YC, Beran-Steed RK; Medline: 89034032; Escherichia coli DNA topoisomerase I is a zinc metallo- 
protein with three repetitive zinc-binding domains." J Biol Chem 1988;263:15857-15859. 
[2] Ahumada A ( Tse-Dinh YC; Medline: 99011409; The Zn(ll) binding motifs of E. coll DNA topolsomoraso I Is part 
of a high-affinity DNA binding domain." Biochem Biophys Res Commun 1998;251 :509-514. 

Number of members: 51 

i 

[1956] 806, AIRC 
AIR carboxylase 
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Member* of this family catalyse the decarboxylation of i T (5-pho6phoribo6yl)-5-amino-4-imidazole-carboxyiate (AIR). 
This lamlly calalyso tho sixth stop of do novo purine biosynthesis. Somo rnombors of this family contain two copies of 
ihlo t.lumfiin, Numbnr al rnnmboro; 3B 
[1957] 807. Bromodomain signature and prolile 
5 PROSITE cross-relerence(s): PS00633; BROMODOMAINM , PS50014;" 
B.ROMODOMAIN_2 

The bromodomain [1 ,2,3] is a conserved region ot about 70 amino acids found in the following proteins: 

- Higher eukaryotos transcription initiation lactor TFIID 250 Kd subunit (TBP-associated factor p250) (gene CCG1). 
w P250 associated with the TFIID TATA-box binding protein and seems essential for progression of the G1 phase 

of the cell cycle. 

- Human RING3, a protein of unknown function encoded in the MHC class It locus. 

- Mammalian CREB-binding protein (CBP), which mediates cAMP-gene regulation by binding specifically to phos- 
phorylated CREB protein. 

/c - Dror>ophil/i lornulo otorilo homoolic protoin (gono fsh), roquirod maternally for propor expression of other homeotic 
genes involved in pattern formation, such as Ubx. 

- Drosophila brahma protein (gene brm), a protein required for the activation.of multiple homeotic genes. 

- Mammalian homologs of brahma. In human, three brahma-like proteins are known: SNF2a(hBRM), SNF2b, and 
BRG1 

20 . Human BS69, a protoin that binds to adenovirus E 1 A and inhibits E 1 A transactivation - Human peregrin (or Br1 40). 
Yoaot BDF1 [3], a transcription factor involved in tho oxprossion of a broad class of gones including snRNAs. 

- Yeast GCN5, a general transcriptional activator operating in concert with certain other DNA-binding transcriptional 
activators, such as GCN4, HAP2/3/4 or ADA2. 

Yeast NPS1/STH1 , involved in G(2) phase control in mitosis. 
25 - Yeast SNF2/SWI2, which is part of a complex with the SNF5, SNF6, SWI3 and ADR6/SW11 proteins. This SWI- 
complox Is involvod in transcriptional activation. 

Yoasl SPT7, a transcriptional activator of Ty elements and possibly other genes. 
Caenorhabditis elegans protein cbp-1. 
Yeast hypothetical protein YGR056w. 
30 - Yeast hypothetical protein YKR008w. 

Yeast hypothetical protein L9638. 1 . 

[1 958] Some proteins contain a region which, while similar to some extent to a classical bromodomain, diverges from 
it by either lacking part of the domain or because of an insertion. These proteins are: 

35 

- Mammalian protein HRX (also known as AIM or MLL), a protein involved in translocations leading to acute leuke- 
mias and which possibly acts as a transcriptional regulatory factor. HRX contains a region similar to the C- terminal 
half of the bromodomain. 

- Caenorhabditis elegans hypothetical protein ZK783.4. The bromodomain of this protein has a 23 amino-acid in- 
40 sertion. 

- Yeast protein YTA7. This protein contains a region with significant similarity to the C-terminal half of the bromodo- 
main. As it is a member of the AAA family (see <PDOC00572>) it is also in a functionally different context. 

[1959] The above proteins generally contain a single bromodomain, but some of them contain two copies, this is the 
45 case of BDF1, CCG1 , fsh, RING3, YKR008wand L9638.1. 

[1960] The exact function of this domain is not yet known but it is thought to be involved in protein^ rote in interactions 

and it may be important for the assembly or activity of multicomponent complexes involved in transcriptional activation. 

[1961] The consensus pattern that has been developed spans a major part of the bromodomain; a more sensitrve 

detection is available through the use of a profile which spans the whole domain. 
so Consensus pattern[STANVF]-x(2)-F.x(4)-[DNS]-x(5,7)-|DENQTF]-Y-[HFY].x(2)-[LIVMFY]-x(3)-[LIVM]-x(4)-[LIVM]-x 

(6,8)-Y-x(12,13HLIVM]-x(2)-N-[SACF]-x(2).[FY] 
References 
55 [1962] 

[ 1] Haynes S.R., Doolard C, Winston R, Beck S., Trowsdale J., Dawid I.B. Nucleic Acids Res. 20:2693-2603 
(1992). 
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residues. 

[1981] Signature patterns were developed for both conserved regions. 

[1982] Consensus pattern[EDQH]-x-K-x-[DN]-G-x-R-[GACIVM) [K is the activ site residue) . ■ 

[1983] Consensus patternE-G-[LIVMA]-[LI VM](2)-[KR]-x(5,8)-[YW]-lQNEK]-x(2 l 6) T [KRH]-x(3,5)-K-[LIVMPY]-K 
5 Sequences known to belong to this class detected by the patternALL, except for archebacterial DNA ligases. 

Ml 

Tomkinson A.E., Totty N.F., Ginsburg M., Lindahl T. 
Proc. Natl. Acad. Sci. U.S.A. 88:400-404(1991). 
w 1 2 ] 

Lindahl T., Barnes D.E. 

Annu. Rev. Biochem. 61:251-281(1992). 

[3] 

Kletzin A. 

is Nucleic Acids Res. 20:5389-5396(1 992). 

[1984] 812. (FAD_Gly3P_dh) FAD-dependent glycerol-3-phosphate dehydrogenase signatures PROSITE cross-ref- 
erence^): PS00977; FAD_G3PDH_1 , PS00978; FAD_G3PDH_2 

[1985] FAD-dependent glycerol-3-phosphate dehydrogenase (EC 1.1.99.5) (GPD) catalyzes the conversion of glyc- 
20 erol-3-phosphate into dihydroxyacetone phosphate. In bacteria [1 ] it is associated with the utilization of glycerol coupled 
to respiration. In Escherichia coli, two isozymes are known: one expressed under anaerobic conditions (gene glpA) 
and one in aerobic conditions (gene glpD). In eukaryotes, a mitochondrial form of GPD participates in the glycerol 
phosphate shuttle in conjunction with an NAD-dependent cytoplasmic GPD (EC 1.1.1.8) [2, 3].' 
[1986] These enzymes are proteins of about 60 to 70 Kd which contain a probable FAD-binding domain in their N- 
25 terminal extremity. The mammalian enzyme differs from the bacterial or yeast proteins by having an EF-hand calcium- 
binding region (See <PDOC00018>) in its C-terminal extremity. 

[1987] Two signature patterns were developed. One based on the first half of the FAD-binding domain and one which 
corresponds to a conserved region in the central part of these enzymes. 
[1988] Consensus pattern[IV]-G-G-G-x(2)-G-[STACV]-G-x-A-x-D-x(3)-R-G 
30 Consensus patternG-G-K-x(2)-[GSTE]-Y-R-x(2)-A 

[1] 

Austin D., Larson T.J. 
J. Bacteriol. 173:101-107(1991). 
35 [2] 

Roennow B., Kielland-Brandt M.C. 

Yeast 9:1121-1130(1993). 

[3] 

Brown L.J., McDonald M.J., Lehn D.A., Moran S.M. 
40 J. Biol. Chem. 269:14363-14366(1994). 

[1989] 813. (Fapy_DNA_glyco) Formamidopyrimidine-DNA glycosylase signature PROSITE cross-referencG(s): 
PS01242; FPG 

[1990] Formamidopyrimidine-DNA glycosylase (EC 3.2.2.23) [1] (Fapy-DNA glycosylase) (gene fpg) is a bacterial 
45 enzyme involved in DNA repair and which excise oxidized purine bases to release 2,6-diamino-4-hydroxy-5N-methyl- 
formamidopyrimidine (Fapy) and 7,8-dihydro-8-oxoguanine (8-OxoG) residues. In addition to its glycosylase activity, 
FPG can also nick DNA at apurinic/apyrimidinic sites (AP sites). FPG is a monomeric protein of about 32 Kd which 
binds and require zinc for its activity. 

[1991] The binding site for zinc seems to be located in the C-terminal part of the enzyme where fours conserved and 
50 essential [2] cysteines are located. A signature pattern was developed based on this region. 

[1992] Consensus patternC-x(2,4)-C-x-[GTAO]-x-[IV]-x(7).R-|GSTAN]-[STA]-x-[FYI)-C-x(2)-C-Q 
[The four C's are putative zinc ligands] 

[1] 

55 Duwat P., de Oliveira R., Ehrlich S.D., Boiteux S. 

Microbiology 141:411-417(1995). - 1 

12] 

O'Connor T.E., Graves R.J., Demurcia G. t Castaing B., Laval J. 
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J, Biol, Chom. 208:9003-9070(1993). 

liooaj Ijm. (a.flliumnapepl) Qrti'iimrt-flluirtmyllinnBpoplitJnBo etohnluro PROSITE crooewoforonco(o): PS0O462; 
G_GLU_TRANSPEPTIDASE 

5 [1994] Gamma-glutamyltranspeptidase (EC 2.3.2.2) (GGT) [1] catalyzes the transfer of the gamma-glutarriyl moiety 
of- glutathione to an acceptor that may be an amino acid, a peptide or water (forming glutamate). GGT plays a key role 
in the gamma-glutamyl cycle, a pathway for the synthesis and degradation of glutathione. In prokaryotes and eukary- 
otes, it is an enzyme that consists ol two polypeptide chains, a heavy and a light subunit, processed from, a single 
chain precursor. The activo site of GGT is known to be located in the light subunit. 

io [1995] The sequences of mammalian and bacterial GGT show a number of regions of high similarity [2]. Pseu- 
domonas cephalosporin acylases (EC 3.5.1.-) that convert 7-beta-(4-carboxybutanamido)-cephalosporanic acid (GL- 
7 AC A) into 7-aminocephalosporanic acid (7ACA) and glutaric acid are evolutionary related to GGT and also show 
some GGT activity [3]. Like GGT, these GL-7ACA acylases, are also composed of two subunits. 
[1996] One of the conserved regions correspond to the N-terminal extremity ot the mature light chains of these 

*5 enzymes. This region was used as a signature pattern. 

[1997] Consensus patternT-[STA]-H-x-[STHLIVMA]-x^ .2)-[FY]-G 

[1] 

Tate S.S., Meister A. 
?o Moth. Enzymol. 113:400-419(1985). 

[2] 

Suzuki H., Kumagai H., Echigo T., Tochikura T. 
J. Bacterid. 171:5169-5172(1989). 
I 3] 

25 Ishiye M., Niwa M. 

Biochim. Biophys. Acta 1132:233-239(1992). 

[1998] 815. G-protein gamma subunit profile 

PROSITE cross-reference(s): PS50058; G__PROTEIN_GAMMA 

so [1999] Guanine nucleotide-binding proteins (G proteins) [1] act as intermediaries in the transduction of signals gen- 
erated by transmembrane receptors. G proteins consist of three subunits (alpha, beta, and gamma). The alpha subunit 
binds to and hydrolyzos GTP; tho functions of the bota and gamma subunits are less clear but they seem to be required 
for the replacement of GDP by GTP as well as for membrane anchoring and receptor recognition. 
[2000] The gamma subunits are small proteins (from 70 to 110 residues) that are bound to the membrane via a 

35 isoprenyl group (either a tarnesyl or a geranylgeranyl) covalently linked to their C-terminus. In mammals there are at 
least 12 different isoforms of gamma subunits. 

[2001] The Caenorhabditis elegans protein egl-10, which is a regulator of G-protein signalling, contains a G-protein 
gamma-like domain. 

[2002] A profile was developed that spans the complete length of the gamma subunit. 
40 l 1] 

Pennington S.R. 

Protein Prof. 2:16-315(1995). 

[2003] 816. GNS1/SUR4 family signature 

PROSITE cross-reference(s): PS01188; GNS1_SUR4 
45 [2004] The following group of eukaryotic integral membrane proteins, whose exact function has not yet clearly been 

established, are evolutionary related [1]: 

Yeast GNS1 [2], a protein involved in synthesis of 1,3-beta-glucan. 

Yoaet SUR4 (or APA1 , SRE1 ) [3J, a protein that could act in a glucose-signaling pathway that controls the expres- 
ses sion ot several genes that are transcriptionally regulated by glucose. 
Yeast hypothetical protein YJL 196c. 
Caenorhabditis elegans hypothetical protein C40H1 .4. 
Caenorhabditis elegans hypothetical protein D2024.3. 

55 [2005] The proteins have from 290 to 435 amino acid residues. Structurally, they seem to b formed of three sections: 
a N-terminal region with two transmembrane domains, a central hydrophilic loop and a C-terminal region that contains 
from one to three transmembrane domains. A conserved region that contains three histtdines was selected as a sig- 
nature pattern. This region is located in the hydrophilic loop. 
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ATP -binding proteins in 
■ _ ; . RMA helicases I8.9,™) 
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GTP-binding elongation factors (EF-Tu, EF-1atpha, EF-G, EF-2, etc.). 
Ras family of GTP-binding proteins (Ras, Rho, Rab, Ral, Ypt1, SEC4, etc.). 
Nuclear protein ran (see <PDOC00859>). 
ADP-ribosy!ation factors family (see <PDOC00781>). 
5 - Bacterial dnaA protein (see <PDOC00771>). 

Bacterial recA protein (see <PDOC001 31 >). 
Bacterial recF protein (see <PDOC00539>). 

Guanine nucleotide-binding proteins alpha subunits (Gi, Gs, Gt, GO, etc.). 
DNA mismatch repair proteins mutS lamily (See <PDOC00388>). 
io - Bacterial type It secretion system protein E (see <PDOC00567>). 

[2021] Not all ATP- or GTP-binding proteins are picked-up by this motif. A number of proteins escape detection 
because the structure of their ATP-binding site is completely different from that of the P-loop. Examples of such proteins 
ar the E1-E2 ATPases or the glycolytic kinases. In other ATP- or GTP-binding proteins the flexible loop exists in a 
*5 slightly different form; this is the case tor tubulins or protein kinases. A special mention must be reserved for adenylate 
kinase, in which there is a single deviation from the P-loop pattern; in the last position Gly is found instead of Ser or Thr. 
[2022] Consensus pattern[AG]-x(4)-G-K-[ST] 

Ml 

20 Walker J.E., Saraste M., Runswick M.J., Gay N.J. 

EMBO J. 1:945-951(1982). 
[2] 

Moller W., Amons R. 
FEBS Lett. 186:1-7(1985). 

25 1 3] 

Fry D.C., Kuby S.A. ( Mildvan AS. 

Proc. Natl. Acad. Sci. U.S.A. 83:907-911(1986). 

[4] 

Dever T.E., Glynias M.J., Merrick W.C. 
30 Proc. Natl. Acad. Sci. U.S.A. 84:1814-1818(1987). 

[5] 

Saraste M. ( Sibbald P.R., Wittinghofer A. 
Trends Biochem. Sci. 15:430-434(1990). 
[6] 

35 Koonin E.V. 

J. MoL Biol. 229:1165-1174(1993). 
[7] 

Higgins C.F., Hyde S.C.. Mimmack M.M., Gileadi U. ( Gill D.R., Gallagher M.P. 
J. Bioenerg. Biomembr. 22:571-592(1990). 
40 [ 8] 

Hodgman T.C. 

Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata). 
19] 

Linder P., Lasko P., Ashburner M., Leroy P., Nielsen P.J., Nishl K., 
4$ Schnier J., Slonimski P.P. 

Nature 337:121-122(1989). 
[10] 

Gorbalenya A.E., Koonin E.V., Donchenko A. P., Blinov V.M. 
Nucleic Acids Res. 17:4713-4730(1989). 

SO 

[2023] 821. PE: PE family 

This family named after a PE motif near to the amino terminus of the domain. The PE family of proteins all contain an 
amino-terminal region of about 110 amino acids. The carboxyl terminus of this family are variable and fall into several 
classes. The largest class of PE proteins is the highly repetitive PGRS class which have a high glycine content. The 
55 function of thes protoins is uncertain but it has boon suggoslod thai thoy may bo rolalod \o antigenic vmlntlun of 
Mycobacterium tuberculosis [1]. Number of members: 88 ' 
[2024] [1] Medline: 98295987, Deciphering the biology of Mycobacterium tuberculosis from the complete genome 
sequence. Cole ST, Brosch R, Parkhill J, Gamier T, Churcher C, Harris D, Gordon SV, Eiglmeier K, Gas S, Barry CE 
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3rd Tokala F, Bndcock K, Bflsham D, Brown D, Chillingworth T, Connor R, Davies R, Devlin K, Feltwell T. Gentles S, 
Harrilin N, Holroyd S, Hornsby T, Jagels K. Barrell BG; et al; Nature 1998;393:537-544. 
I202G) ll2'2. (FIND) Rlbonucloaao II lumlly signature 
PROSITEcross-reference(s):PS01175;RIBONUCLEASE_ll 
£ [2026] On the basis ot sequence similarities, the following bacterial and eukaryot.c proteins seem to torm a family. 

. Escherichia coli and related bacteria ribonuclease II (EC 3.1.13.1) (RNase II) (gene rnb) [1]. RNase II is an exo- 
nucloaso involved in mRNA decay. It degrades mRNA by hydrolyzing single-stranded polyribonucleot.des proces- 
sivoly in Iho 3' to 5' direction. 

w . Bacterial protein vacB. In Shigella tlexneri. vacB has been shown to be required for the express.cn of v.rulence 
genes at the posttranscriptional level. 

- Yeast protein SSD1 (or SRK1 ) which is implicated in the control of the cell cycle G1 phase. 

- Yeast protein DISS [2], which binds to ran (GSP1 ) and ehances the the nucleolide-releasing activity of RCC1 on ran. 
Fission yofisl proloin dis3, which is implicated in mitotic control. 

io . Nourospora crassa cyl-4. a mitochondrial protein roquirod lor RNA 5' and 3 1 ond procoss.ng and splic.ng. 

- Yeast protein MSU1 , which is involved in mitochondrial biogenesis. 

- Synechocystis strain PCC 6803 protein zam [3], which control resistance to the carbonic anhydrase inh.brtor aceta- 
zolamide. 

Caenorhabditis elegans hypothetical protein F48E8.6. 

120271 The size of these proteins range from 644 residues (rnb) to 1250 (SSD1). While their sequence is highly 
divergent they share a conserved domain in their C-terminal section [4]. It is possible that this domain plays a role in 
a putative exonuclease function that would be common to all these proleins. A signature pattern was developed based 

is ^Tco^ 
|FY)-x-D-x(3)-[HQ] 

Ml 

Zilhao R., Camelo L, Arraiano CM. 
30 Mol. Microbiol 8:43-51 (1993). 

[2] 

Noguchi E., Hayashi N., Azuma Y. ( Soki T. ( Nakamura M., Nakashima N„ 
Yanagida M., He X., Mueller U. ( Sazer S., Nishimoto T. 
EMBO J. 15:5595-5605(1996). 
35 [ 3] 

Beuf L, Bedu S., Cami B., Joset R 
Plant Mol. Biol. 27:779-788(1995). 

[ 4] 

Mian I.S. 

40 Nucleic Acids Res. 25:31 87-31 95(1 997). 

[2029] 823. Src homology 2 (SH2) domain profile 

PROSITE cross-relerence(s): PS50001; SH2 rtli<iflrtaea 
[20301 The Src homology 2 (SH2) domain is a protein domain ot about 100 ammo-acid residues first identified as a 
as conserved sequence region between the oncoproteins Src and Fps [1 ]. Similar sequences were later found in many 
other intracellular signal-transducing proteins [2]. SH2 domains function as regulatory modules of intracellular signalling 
cascades by interacting with high affinity to phosphotyrosine-containing target peptides in a sequence-specific and 
strictly phosphorylation-dependent manner [3,4,5,6]. 

[2031] The SH2 domain has a conserved 3D structure consisting of two alpha helices and six to seven beta-strands. 
so The core of the domain is formed by a continuous beta-meander composed of two connected beta-sheets [7]. 
[2032] So far, SH2 domains have been identified in the following proteins: 

Many vertebrate, invertebrate and retroviral cytoplasmic (non-receptor) protein tyrosine kinases. In particular in 
' the Src, Abl, Bkt, Csk and ZAP70 families of kinases. 
55 - Mammalian phosphatidylinosilol-specific phospholipase C gamma- 1 and 2 Two copies ot the SH2 domain are 
found in those proteins in between the catalytic 'X-' and 'Y-boxes'(see <P DOC 50007 >). 
Mammalian phosphatidyl inositol 3-kinase regulatory p85 subunit. 
- Some vertebrate and invertebrate protein-tyrosine phosphatases. 
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cytoplasmic proteins » fj*^ and actjvators of transcription). 
STAT proteins (signal transauue. 


Chicken tensin. Tfi 

yado W skil..StoneJ.C..Pawsc^T. 
Mol. Cell. Biol. 6:4396-4408(1986). 

Russel R.B.. Breed J , Barton G.J. 
FEBS Lett. 304:15-20(1992). 

PawsonT.. Schlessmger J. 

Curr. Biol. 3:434-442(1993). 

15] . _ 

Mayer B.J.. Baltimore D. 
Trends Cell. Biol. 3:8-13(1993). 

16] 

PawsonT. 

Nature 373:573-580(1995). 

S-etan™.» ' ■ 3 ' he K8 "™ Sly ' 0San "' eS h8ma,a 

transmembrane domains. Tho bosl conso y ,o V |GAl-IGSTl-S-(KR] 
is used as a signature pattern fGS1 . L . Y . [STA G](2).x(4)-[LIVFYA].lLiVST H YI>x(3) [G A] IS 

[2037] Consensus pattem[PAV]-x-Y-[GS] L y i 
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[1] 

Sandal N.N. , Marcker K. A. 

Tfondo Biochoin. Sci. 10:10-19(1004), 

[2] 

5 Smith F.W., Hawkesford M.J., Prosser I.M., Clarkson D.T. 

• Mol. Geni Genet. 247:709-715(1995). 

120381 825. TYA; TYA transposon protein . . , . . 

Ty mc i yoast tmnsposons. A 5.7kb tmnscrlpl codes lor P 3 a lusion protein ot TYA and TYB. The TYA protein is analogous 
10 to the gag protein ot retroviruses. TYA a is cleaved to lorm 46kd protein which can lorm mature vmon like particles [1]. 

Number of members: 59 . ^ . 

[20391 HI Medline- 97404699. Cryo-electron microscopy structure of yeast Ty retrotransposon virus-like particles. 

Palmer KJ, Tichelaar W, Myers N, Burns NR. Butcher SJ. Kingsman AJ, Fuller SD, Saibil HR; J Virol 1997;71: 

6863-6868. 
is [2040] 826. AldolasoJI 

Class II Aldolase and Adducin N-terminal domain. 

-|. This family includes class II aldolases and adducins which have not been ascribed any enzymatic function. Number 
ot members: 37 


20 


25 


Rolerences: 


[2041 J 

[1 ) Medline: 9329481 9. The spatial structure of the class II L-tuculose-1 -phosphate aldolase from Escherichia coli. 
Dreyer MK, Schuiz GE; J Mol Biol 1993;231:549-553. 

[2] Medline: 96256522. Catalytic mechanism of the metal-dependent tuculose aldolase from Escherichia coli as 
derived from the structure. Dreyer MK, Schuiz GE; J Mol Biol 1996;259:458-466. 

30 [2042] 827. CBD_2 

-!- Two tryptophan residues are involved in cellulose binding. 

-!- Cellulose binding domain found in bacteria. Number oi members: 51 

35 References: 

[20431 111 Medline- 95284032. Solution structure of a cellulose-binding domain from Cellulomonas fimi by nuclear 
magnetic resonance spectroscopy. Xu GY, Ong E. Gilkes NR. Kilburn DG, Muhandiram DR, Harris-Brandts M, Carver 
JP, Kay LE, Harvey TS; Biochemistry 1995;34:6993-7009. 
40 [2044] 828 P 

A unique feature of the eukaryotic subtilisin-like proprotein convertases is the presence of an additional highly con- 
served sequence of approximately 150 residues (P domain) located immediately downstream of the catalytic domain. 
Number of members: 91 

45 References: 

[2045] 

hi Medline- 94252314. A C-terminal domain conserved in precursor processing proteases is required for intramo- 
50 Lcular N-terminal maturation of pro-Kex2 protease. Gluschankof P, Fuller RS; EMBO J 1994;13:2280-2288. 

[2] Medline: 98225190. Regulatory roles of the P domain of the subtilisin-like prohormone convertases. Zhou A, 
Martin S. Lipkind G, LaMendola J, Steiner DF; J Biol Chem 1998;273:11107-11114. 

[2046] 829. Uncharacterized protein family UPF0020 signature 
55 PROSITE cross-reference(s): PS01261; UPF0020 

The following uncharacterized proteins have been shown [1] to share regions of similarities: 

- Escherichia coli hypothetical protein ycbY and HI0116/15, the corresponding Haemophilus influenzae protein. 
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[2072] 835. (Asp_Glu_race) 

Aspartate and glutamate racemases signatures 

[2073] Cross-reference(s) PS00923; ASP_GLU_RACEMASE_1 PS00924; 
ASP_GLU_RACEMASE_2 

[2074] Aspartate racemase (EC 5. 1 . 1 .1 3) and glutamate racemase (EC 5. 1 .1 .3) are two evolutionary related bacterial 

enzymes that do not seem to require a cof actor for their activity [1]. Glutamate racemase, which interconverls L-gluta- 

mate into D-glutamate, is required for the biosynthesis ol peptidoglycan and some peptide-based antibiotics such as 

gramicidin S. In addition to characterized aspartate and glutamate racemases, this family also includes a hypothetical 

protein from Erwinia carotovora and one from Escherichia coli (ygeA). Two conserved cysteines are present in the 

sequence of these enzymes. They are expected to play a role in catalytic activity by acting as bases in proton abstraction 

from the substrate. Signature patterns were developed for both cysteines. 

[2075] Consensus pattern: [IVA]-[LIVM]-x-C-x(0 ( 1)-N-[ST]-[MSA]-[STH]-[LIVFYSTANK] 

Consensus pattern: [LIVM](2)-x-[AG]-C-T-[DEH]-[LIVMFYHPNGRS]-x-[LIVM] 

[2076] [ 1] Gallo K.A., Knowles J.R., Biochemistry 32:3981-3990(1993). 

[2077] 836. (ATP-sulfurylase) 

ATP-sulturylase 

[2078] This family consists of ATP-sulfurylase or sulfate adenylyltransferase EC:2.7.7.4 some of which are part of a 
bifunctional polypeptide chain associated with adenosyl phosphosulphate (APS) kinase APS__kinase. Both enzymes 
are required for PAPS (phosphoadenosine-phosphosulfate) synthesis from inorganic sulphate [2]. ATP sulfurylase 
catalyses the synthesis of adenosine-phosphosulfate APS from ATP and inorganic sulphate [1]. 

Number of members: 37 

[2079] 

[1] Kurima K, Warman ML, Krishnan S, Domowicz M, Krueger RC Jr, Deyrup A, Schwartz NB; Medline: 98337975 
A member of a family of sulfate-activating enzymes causes murine brachymorphism" (published erratum appears 
in Proc Natl Acad Sci U S A 1998 Sep 29;95(20):12071] Proc Natl Acad Sci U S A 1998;95:8681-8685. 
[2] Rosenthal E, Leustek T; Medline: 96096529 A multifunctional Urechis caupo protein, PAPS synthetase, has 
both ATP sulfurylase and APS kinase activities.* Gene 1995;165:243-248. 

[2080] 837. (ATP-synt_F) 
ATP synthase (F/14-kDa) subunit 

[2081] This family includes 14-kDa subunit from vATPases [1], which is in the peripheral catalytic part of the complex 
[2]. The family also includes archaebacterial ATP synthase subunit F [3]. 

Number of members: 23 

[2082] I 
[1]Guo Y, Kaiser K, Wieczorek H, DowJA; Medline: 96269411 The DroeophllamGlanogastor gonovhaM encoding 
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a 14-kDa F-subunit of the vacuolar ATPase." Gene 1996;172:239-243. 

[2] Pong SB Cridor DP, Tsai SJ t Xio XS, Stone DK; Modlino: 96216416 Identification of a 14-kDa subunil associated 
with the catalytic sector of clathrin-coaled vesicle H+-ATPase. B J Biol Chem 1996;271 : 3324-3327. 
mi Wilmo R Frolborg C, Wogoilo E. Motor I, Mnyor F, Mullor V; Modlino: 96324968 Subunit structure and organ- 
ization of the genes of the A1 AO ATPase from the Archaeon Methanosarcina mazei Gol." J Biol Chem 1996;271: 
. 18843-18852. 

[2083] 838. (CBD_4) 
Starch binding domain 

Number of members: 48 


120841 039. (CblX) • 
[20851 The function of CbiX is uncertain, however it is found in cobalamin biosynthesis operons and so may have a 
16 related function. Some CbiX proteins contain a striking histidine-rich region at their C-terminus, wh.ch suggests that it 
might bo involved in metal cholation (1 ]. 

Number of members: 6 

20 [2086] [1] Raux E, Lanois A, Warren MJ. Rambach A, Thermes C; Medline: 98416126 Cobalamin (vtemin B12) 
biosynthesis: identification and characterization ot a Bacillus megaterium cobl operon." Biochem J 1998;335:159-166. 

840. (Complex1_51K) 

SB [2087] Respiratory-chain NADH dehydrogenase 51 Kd subunit signatures Cross-reference(s) PS00644; 

COMPLEX1 51K_1 PS00645; COMPLEX1_51K_2 . 

[2088] Rospimtory-chHin NADH dehydrogenase (EC 1 .6.5.3) |1 .2] (also known as complex I or NADH-ubiquinone 

oxIdorociucuiBo) Is an ollgomorlc enzymatic complex located in the inner mitochondrial membrane wh.ch also seems 

to exist in the chloroplast and in cyanobacteria (as a NADH-plastoquinone oxidoreductase). Among the 25 to 30 
so polypeptide subunits of this bioenergetic enzyme complex there is one with a molecular weight of 51 Kd (in mammals . 

which is the second largest subunit of complex I and is a component of the iron-sulfur (IP) fragment of the enzyme. It 

seems to bind to NAD, FMN. and a 2Fe-2S cluster. 

[2086] Tho 51 Kd 6ubunit i6 highly similar to [3.4]: 

35 - Subunit alpha of Alcaligenes eutrophus NAD-reducing hydrogenase (gene hoxF) which also binds to NAD, FMN. 
and a 2Fe-2S cluster. 
- Subunit NQ01 of Paracoccus denitrificans NADH-ubiquinone oxidoreductase. 
Subunit F of Escherichia coli NADH-ubiquinone oxidoreductase (gene nuoF). 

AO [2090] The 51 Kd subunit and the bacterial hydrogenase alpha subunit contains throe regions of sequence similar- 
ities The first one most probably corresponds to the NAD-binding site, the second to the FMN-b.ndmg site, and the 
third one, which contains three cysteines, to the iron-sulfur binding region. Signature patterns have been developed 
for the FMN -binding and for the 2Fe-2S binding regions. 

[2091] Consensus pattern: G-[AM]-G-[AR]-Y-[LIVM]-C-G-IDE](2)-[STA](2)-[LIM](2)-[EN]- S 
as Consensus pattern: E-'S-C-G-x-C-x-P-C-R-x-G [The three C's are putative 2Fe-2S l.gands] 

[ 1] Ragan C.I., Curr. Top. Bioenerg. 15:1-36(1987). 

[ 2] Weiss H., Friedrich T, Hofhaus G., Preis D., Eur. J. Biochem. 197:563-576(1991). 
[ 31 Fearnley I.M., Walker J.E. Biochim. Biophys. Acta 1140:105-134(1992). 
so [ 4] Weidner U., Geier S., Ptock A.. Friedrich T. Leif H., Weiss H„ J. Mol. Biol. 233:109-122(1993). 

[2092] 841 . (DAP_epimerase) 
Diaminopimelate epimerase signature 
[2093] Cross-reference(s) PS01326; DAP.EPIMERASE 
55 Diaminopimelate epimerase (EC 5.1 .1 .7) catalyzes the isomeriazation of L.L- to D.L-meso-diaminopimelate in the 
biosynthetic pathway leading from aspartate to lysine. This enzyme is a protein of about 30 Kd. Two conserved cysteines 
seem [1] to function as the acid and base in the catalytic mechanism. As a signature pattern, the region surrounding 
the first of these two active site cysteines were selected. 
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Number of members: 42 
[2107] P45. (DUF56) 

[2108] Intogrnl membrane protein , , 

|2100] Thu momtioie ol Ihlo litinlly mo pulnttvo Inlogml motnbrnno protolnp. Tho luncllon of tho lamiiy is .unknown, 
however the family includes Sec59 f rom yeast. Sec59 is a dolichol kinase EC:2.7. 1 . 1 08, but it is not clear it the enzymatic 
activity resides in this region or its N terminal region. 

Number of members: 13 

o 

[2110] 846. (DUF94) 

[2111] Domain of unknown function \ 
[2112] The function of this domain is unknown. It is found in both eukaryotes and archaebacteria. The alignment 
contains a completely conserved aspartate residue that may be functionally important. The eukaryotic domains contains 
is three conserved cysteines and a hislidine that might be metal binding, however these are absent in the archaebacteria!^ 
proloino. t . . \ 

Number of members: 9 

20 [2113] 847. (FF) 
[2114] FF domain 

[2115] This domain may be involved in protein-protein interaction [1]. 


25 


35 


50 


Number of members: 42 


[2116] [1] Bedford MT, Leder P; Medline: 99322199 The FF domain: a novel motif that often accompanies WW 
domains." Trends Biochem Sci 1999;24:264-265. 
[2117] 848. (FLO.LFY) 

Floricaula / Leafy protein . 
30 [211 8] This family consists of various plant development proteins which are homologues of floricaula (FLO) and Leafy 
(LFY) proteins which are floral meristem identity proteins. Mutations in the sequences of these proteins affect flower 
and leaf development. 


Number of members: 16 


[2119] 


[1] Hofer J, Turner L, Hellens R, Ambrose M, Matthews R Michael A, Ellis N; Medline: 97411151 UNIFOLIATA 
regulates leaf and flower morphogenesis in pea." Curr Biol 1997;7:581-587. 
40 [2] Weigel D. Alvarez J, Smyth DR, Yanotsky MF, Meyerowitz EM; Medline: 92274452 LEAFY controls floral mer- 

istem identity in Arabidopsis." Cell 1 992;69:843-859. 

[2120] 849. (G-patch) 

G -patch domain ^ . * * oma 

45 [2121] This domain is found in a number of RNA binding proteins, and is also found in proteins that contain RNA 
binding domains. This suggests that this domain may have an RNA binding function. This domain has seven highly 
conserved glycines. 


Number of members: 47 


[21 22] [1 ] Aravind L. Koonin EV; Medline: 1 0470032 G-patch: a new conserved domain in eukaryotic RNA-processing 
proteins and type D retroviral poly proteins." Trends Biochem Sci 1999;24:342-344. 
[2123] , 850. (Gram-ve_porins) 
General diffusion Gram-negative porins signature 
55 [2124] Cross-reference(s) PS00576; GRAM_NEG_PORIN 

The outer membrane of Gram-negative bacteria acts as a molecular filter for hydrophilic compounds. Proteins, known 
as porins [1], are responsible tor th 'molecular sieve' properties of the outer membrane. Porins form large water- filled 
channels which allows the diffusion of hydrophilic molecules into the periplasmic spac . Some porins form general 
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diffusion channels that allows any solutes up to a certain size (that size is known as the exclusion limit) to cross the 
membrane, while other porins are specific for a solute and contain a binding site for that solute inside the pores (these 
are known as selective porins). As porins are the major outer membrane proteins, they also serve as receptor sites 
for the binding of phages and bacteriocins. General diffusion porins generally assemble as trimer in the membrane 
£ and the transmembrane core of these proteins is composed exclusively of beta strands [2]. It has been shown [3] that 
a number of general porins are evolutionary related, these porins are: 

Enterobacteria phoE. 
Enterobacteria ompC. 
io - Enterobacteria ompF. 

Enterobacteria nmpC. 
Bacteriophage PA-2 LC. 
Neisseria PI. A. 
Neisseria Pl.B. 

15 

[2125] As a signature pattern a conserved region was selected, located in the C-terminal part of these proteins, 
which spans two putative transmembrane beta strands. 

[2126] Consensus pattern: [LIVMFY]-x(2)-G-x(2)-Y-x-F-x-K-x(2)-[SN]-[STAV]-[LIVMFYW]-V 

^0 [i] Benz R., Bauer K., Eur. J. Biochem. 176:1-19(1988). 

[2] Jap B.K., Walian P.J., Q. Rev. Biophys. 23:367-403(1990). 

|3] Jeanteur D., Lakey J.H., Pattus F., Mol. Microbiol. 5:2153-2164(1991). 

[2127] 851. (HlyD) 
£5 HlyD family secretion proteins signature 

[2128] Cross-reference(s) PS00543; HLYD_FAMILY 

Gram-negative bacteria produce a number ol proteins which are secreted into the growth modium by a mochanism 
that does not require a cleaved N-terminal signal sequence. These proteins, while having different functions, require 
the help of two or more proteins for their secretion across the cell envelope. Amongst which a protein belonging to the 
30 ABC transporters family (see the relevant entry <PDOC00185>) and a protein belonging to a family which is currently 
composed [1 to 5] of the following members: 


Gene 

Species 

Protein which is exported 

hlyD 

Escherichia coli 

Hemolysin 

appD 

A.pleuropneumoniae 

Hemolysin 

IcnD 

Lactococcus lactis 

Lactococcin A 

IktD 

A.actinomycetemcomitans 

Pasteurella haemolytica Leukotoxin 

rtxD 

A.pleuropneumoniae 

Toxin-Ill 

cyaD 

Bordetella pertussis 

Calmodulin-sensitive adenylate cyclase-hemolysin(cyclolysin 

cvaA 

Escherichia coli 

Colicin V 

prtE 

Erwinia chrysanthemi 

Extracellular proteases B and C 

aprE 

Psoudomonas aoruginosa 

Alkalino protoa60 

emrA 

Escherichia colt 

Drugs and toxins 

yjcR 

Escherichia coli 

Unknown 


These proteins are evolutionary related and consist of from 390 to 480 amino acid residues. They seem to be anchored 
in the inner membrane by a N-terminal transmembrane region. Their exact role in the secretion process is not yet 
so known. The C-ierminal section of these proteins is the best consorvod rogion; a slgnaturo pattorn from that region was 
derived, 

[2129] Consensus pattern: [LIVM]-x(2)-G-[LM]-x(3)-[STGAV>x-[LIVM^ 
[LIVMFYW](3) 

Sequences known to belong to this class detected by the pattern ALL, except for emrA and yjcR. 

55 
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10 


20 


25 


35 


40 


50 


55 


References: 


[2130] 


111 Gileon U Mahanty H.K.. Kollor R. ( EMBO J. 9:3875-3884(1990). • 
. [21LetoffeS Delepelaire P., Wandersman C, EMBO J. 9:1375-1382(1990). 
[3] Stoddard G.W., Petzel J.R. van Belkum M.J., Kok J., McKay LL. Appl. Environ. Microbiol. 58:1952-1961(1992). 
[4] Duong F. f Lazdunski A., Cami B., Murgier M., Gene 121:47-54(1992). 
[5] Lewis K., Trends Biochem. Sci. 19:119-123(1994). 


[2131] 852. (IBR) 
In Between Ring fingers 


In between mng linger*. . f . „ rt n Qwr^\ Th« 

[21 321 The IBR (In Between Ring lingers) domain is found to occur between pairs of ring fingers ( 2, ^ 3H "). The 
luncl of thii domain is unknot Thl. domain has also been called the C6HC domain and DR.L (for double RING 


is linger linked) domain [2]. 
Number ol members: 25 


[2133] 

11 J Morett E, Bork P; Medline: 10366851 A novel transaction domain in parkin.'Trends Biochem Sci,1999;24: 

mender Reijden BA, Erpelinck-Verschueren CA, Lowenberg B, Jansen JH; Medline: 99349709 TRIADS: a new 
class ol proteins with a novel cysteine-rich signature." Protein Sci 1999;8:1557-1561. 


[2134] 853. (I PPT) 
IPP translerase 

111 Durand JM, Bjork GR, Kuwae A, Yoshikawa M. Sasakawa C; Medline: 97440126 The modified nucleoside 
30 L m ethylthio-N6-isopen.enyladeno S ine in tRNA of Shigella flexneri is required lor expression ol v.rulence genes. 

," 2 J 1 ^M 1 ^rS"er^. Gi.lman EC. Martin NC, Hopper AK; Medline: 94187700 Subcellular locations 
LTMODS p^teins mapping o, sequences sufficient for targeting ,c >J^^^,^ a}m ^ m,, °- 
chondrial and nuclear isolorms commingle in the cylosol." Mol Cell B.ol 1994;14:2298-2306 
[3] GiHman EC, S.usher LB, Martin NC. Hopper AK; Med.ine: 91 ^^ X ^^^^SS 
N6-isopentenyladenosine modification of mitochondrial and cytoplasmic tRNA.' Mol Cell B.ol 1991.11.2382 2390. 

[2135] 854. (KE2) 

^36rTheTnc n tion of members of this family is unknown, although they have been suggested to contain a DNA 

binding leucine zipper motif [2]. 

Number of members: 9 
45 [2137] 

[1) Ha H, Abe K, Artzt K; Medline: 92084131 Primary structure ol the embryo-expressed gene KE2 from the mouse 
H-2K reqion." Gene 1991;107:345-346. rtrrtl£ »n 
[2] Shang HS. Wong SM. Tan HM, Wu M; Med.ine: 95129859 YKE2. a Y^J "uctea. -gen. 
showing homology to mouse KE2 and containing a putative leuane-z.pper motrf." Gene 1994.151.197-201. 

[2138] 855. (Lipoprotein_6) 

Prokaryotic membrane lipoprotein lipid attachment site 

[2139] Cross-reference(s)PS00013;PROKAR_LIPOPROTEIN , hua . fi . 

n promotes, membrane .ipoproteins are synthesized with a precursor signal pepl.de w ,ch «* 
lipoprotein signal peptidase (signal peptidas II). The peptidase recogn.zes a conserved f^ u ^^^^* 
of a cysteine residue to which a glyceride-fatty acid lipid is attached |1]. Some ol the prote.ns known to undergo such 
processing currently include (for recent listings see [1 ,2.3]): 
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Major outer membrane lipoprotein (murein-lipoproteins) (gene Ipp). 
Escherichia coli lipoprotein-28 (gene nlpA). 
Escherichia coli lipoprotein-34 (gene nlpB). 
Escherichia coli lipoprotein nlpC. 
Escherichia coli lipoprotein ntpD. 

Escherichia coli osmotically inducible lipoprotein B (gene osmB). , 

Escherichia coli osmotically inducible lipoprotein E (gene osmE). 

Escherichia coli peptidoglycan-associated lipoprotein (gene pal).- 

Escherichia coli rare lipoproteins A and B (genes rplA and rplB). 

Escherichia coli copper homeostasis protein cutF (or nlpE). 

Escherichia coli plasmids traT proteins. 

Escherichia coli Col plasmids lysis proteins. 

A number of Bacillus beta-lactamases. 

Bacillus subtilis periplasmic oligopeptide-binding protein (gene oppA). 
Borrelia burgdorferi outer surface proteins A and B (genes ospA and ospB). 
Borreiia hermsii variable major protein 21 (gene vmp21 ) and 7 (gene vmp7). 
Chlamydia trachomatis outer membrane protein 3 (gene omp3). 
Fibrobacter succinogenes endoglucanase cel-3. 
Haemophilus influenzae proteins Pal and Pep. 
Klebsiella pullulunase (gene pulA). 
Klebsiella pullulunase secretion protein puis. 
Mycoplasma hyorhinis protein p37. 

Mycoplasma hyorhinis variant surface antigens A, B, and C (genes vIpABC). 
Neisseria outer membrane protein H.8. 
Pseudomonas aeruginosa lipopeptide (gene IppL). 
Pseudomonas solanacearum endoglucanase egl. 

Rhodopseudomonas viridis reaction center cytochrome subunit (gene cylC). 
Rickettsia 17 Kd antigen. 

Shigella flexneri invasion plasmid .proteins mxiJ and mxiM. 

Streptococcus pneumoniae oligopeptide transport protein A (gene amiA). 

Treponema pallidium 34 Kd antigen. 

Treponema pallidium membrane protein A (gene tmpA). 

Vibrio harveyi chitobiase (gene chb). 

Yersinia virulence plasmid protein yscJ. 

Halocyanin from Nlatrobacterium pharaonis [4], a membrane associated copper-binding protein. This is the first 
archaebacterial protein known to be modified in such a fashion). 

[2140] From the precursor sequences of all these proteins, a consensus pattern and a set of rules to identify this 
type of post-translational modification were derived. 

[2141] Consensus pattern: {DERK)(6)-[LIVMFWSTAG](2)-[LIVMFYSTAGCQl-[AGS]-C [C is the lipid attachment 
site] Additional rules: 1 ) 

[2142] The cysteine must be between positions 15 and 35 of the sequence in consideration. 2) There must be at 
least one Lys or one Arg in the first seven positions of the sequence. Sequences known to belong to this class detected 
by the pattern ALL. Other sequence(s) detected in SWISS-PROT some 100 prokaryotic proteins. Some of them are 
not membrane lipoproteins, but at least half of them could be. 

References 

[2143] 

[1] Hayashi S., Wu H.C. ( J. Bioenerg. Biomembr. 22:451-471(1 g90). 
[2] Klein R, Sornorjai R.L., Lau P.C.K., Protein Eng. 2:15-20(1988). 
[3] von Heijne G., Protein Eng. 2:531-534(1989). 

[4] Mattar S., Scharf B., Kent S.B.H.. Rodewald K. ( Oesterhelt D., Engelhard M. J. Biol. Chem. 269:14939-14945 
(1994). 

' - . I 

[2144] 856. (Lipoprotein_7) 
Adhesin lipoprotein 
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[2145] This family consists of the p50 and variable adherence-associated antigen (Vaa) adhesins from Mycoplasma 
hominis. M. hominis is a mycoplasma associated with human urogenital diseases, pneumonia, and septic arthritis [1]. 
An adhe$in is a cell surface molecule that mediates adhesion to other cells or to the surrounding surface or substrate. 
Tho Vnn nnlifjon io n 50-kDa curtnco lipoprotein thnt hns lour tnndom ropotitivo DNA soquoncos oncoding a periodic 
5 poptido structure, and is highly immunogenic in the human host [1]. p50 is also a 50-kDa lipoprotein; having three 
repeats A,B and C, that may be a tetramer ot 191-kDa in its native environment (2]. 


Number of mornbors: 18 


w [2146] 

[ 1 ] Zhang Q, Wise KS; Medline: 96294788 Molecular basis of size and antigenic variation of a Mycoplasma hominis 
adhosin oncodod by divergent vaa genes. Infect Immun 1996;64:2737-2744. 

|2] Henrich B, Kitzorow A, Feldmann RC, Schaal H, Hadding U; Medline: 97047675 Repetitive elements of the 
is Mycoplasma hominis adhesin p50 can be differentiated by monoclonal antibodies/ Inlect Immun 1996;64: 

4027-4034. 

[2147] 857. (MaoCJike) 
MaoC liko domain 

20 [21 48] The MaoC protein is found to share similarity with a wide variety ot enzymes; estradiol 17 beta-dehydrogenase 
4 peroxisomal hydratase-dehydrogenase-epimerase, fatty acid synthase beta subunit. All these enzymes contain other 
domains. This domain is also present in the NodN nodulation protein N, No specific function has been assigned to this 
region of any ot these proteins. The maoC gene is part ot a operon with maoA which is involved in the synthesis ot 
monoamine oxidase [1]. 


25 


Number of members: 46 


[2149] [1] Sugino H, Sasaki M ( Azakami H, Yamashita M, Murooka Y Medline: 96235221 A monoamine-regulated 
Klebsiella aerogenes operon containing the monoamine oxidase structural gene (maoA) and the maoC gene/ J Bac- 
30 teriol 1992;174:2485-2492. 
[2150] 858. (MSP) 

Manganese-stabilizing protein / photosystem II polypeptide 

[2151] This family consists of the 33 KDa photosystem II polypeptide from the oxygen evolving complex (OEC) of 
pliinlo nnd cynnob/tctoriH. Tho pioloin la nloo known hs tho mangnnoso-stabilizing protoin as it is associated with the 
35. manganese complex of the OEC and may provide the ligands tor the complex [1]. 

Number of members: 17 

[2152] [1] Philbrick JB, Zilinskas BA; Medline: 88334494 "Cloning, nucleotide sequence and mutational analysis of 
40 the gene encoding the Photosystem II manganese-stabilizing polypeptide of Synechocystis 6803/ Mot Gen Genet 
1988;212:418-425. 

[2153] 859. (NAC) ™ rtJ .o.™ 
[2154] [1] Makarova KS, Aravind L, Galperin MY, Grishin NV, Tatusov RL, Wolf Yl, Koonin EV; Medline: 99342100 
Comparative genomics of the Archaea (Euryarchaeota): evolution of conserved protein families, the stable core, and 
ifi iho v/irinblo oholl/ Gonomo Roq 1999;9:608-628. 


Number of members: 27 


[2155] 860. (Nop) 

50 Putative snoRNA binding domain 

[2156] This family consists ot various Pre RNA processing ribonucleoproteins. The function ot the aligned region is 
unknown however it may be a common RNA or snoRNA or Noplp binding domain. Nop5p (Nop58p) Swiss:Q12499 
from yeast is the protein component of a ribonucleoprotein protein required for pre-18s rRNA processing and is sug- 
gested to function with Noplp in a snoRNA complex [1). Nop56p Swiss:000567 and Nop5p interact with Noplp and 

55 are required for ribosome biogenesis [2). Prp31 p Swiss:p49704 is required for pre-mRNA splicing in S. cerevisiae [3]. 
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Number of members: 23 
[2157] 

s [1] Wu P, Brockenbrough JS, Metcalfe AC, Chen S, Aris JP; Medline: 98298165 Nop5p is a small nucleolar ribo- , 

nucleoprotein component required for pre- 1 8 S rRNA processing in yeast.- J Biol Chem 1 998;273: 1 6453-1 6463. 
[2]GautierT, BergesT, Tollervey D, Hurt E; Medline: 8038777 Nucleolar KKE/D repeat proteins Nop56pand Nop58p 
interact with Noplp and are required for ribosome biogenesis." Mol Cell Biol 1997;17:7088-7098. 
[3] Weidenhammer EM, Singh M, Ruiz-Noriega M, Woolford JL Jr; Medline: 96184869 The PRP31 gene encodes 

io a novel protein required for pre-mRNA splicing in Saccharomyces cerovisiao." Nucleic Acids Ros 1996;24: 

1164-1170. 

[2158] 861. (Nramp) 

Natural resistance-associated macrophage protein 

f5 The natural resistance-associated macrophage protein (NRAMP) family consists of Nrampl, Nramp2, and yeast pro- 
teins Smf1 and Smf2. The NRAMP family is a novol family of functional rolatod proteins delinod by a consorvod hy- 
drophobic core of ten transmembrane domains [5]. This family of membrane proteins are divalent cation transporters. 
Nrampl is an integral membrane protein expressed exclusively in cells of the immune system and is recruited to the 
membrane of a phagosome upon phagocytosis [1]. By controlling divalent cation concentrations Nrampl may regulate 

20 the interphagosomal replication of bacteria [1]. Mutations in Nrampl may genetically predispose an individual to sus- 
ceptibility to diseases including leprosy and tuberculosis conversely this might howovor provide protection form rheu- 
matoid arthritis [1]. Nramp2 is a multiple divalent cation transporter for Fe2+, Mn2+ and Zn2+ amongst others It Is 
expressed at high levels in the intestine; and is major translerrin-independent iron uptake system in mammals [1]. The 
yeast proteins Smf 1 and Smf2 may also transport divalent cations [3]. 

25 

Number of members: 36 
[2159] 

30 [1] Govoni G, Gros P; Medline: 98383996 Macrophage NRAMP1 and its role in resistance to microbial infections. 

" Inflamm Res 1 998;47:277-284. 

[2] Agranoff DD ( Krishna S Medline: 98294035 Metal ion homeostasis and intracellular parasitism." Mol Microbiol 
1998;28:403-412. 

[3] Pinner E, Gruenheid S, Raymond M, Gros P; Medline: 98030569 Functional complementation of the yeast 
35 divalent cation transporter family SMF by NRAMP2, a member of the mammalian natural resistance- associated 

macrophage protein family. " J Biol Chem 1997;272:28933-28938. 

[4] Cellier M, Belouchi A, Gros P; Medline: 96402487 Resistance to intracellular infections: comparative genomic 
analysis of Nramp." Trends Genet 1996;12:201-204. 

[5] Cellier M, Prive G, Belouchi A ( Kwan T, Rodrigues V, Chia W, Gros P; Medline: 96036029 Nramp defines a 
40 family of membrane proteins." Proc Natl Acad Sci U S A 1995;92:10089-10093. 

[2160] 862. (NTPjransf_2) 
Nucleotidyltransferase domain 

Members of this. family belong to a large family of nucleotidyltransferases [1]. 

45 

Number of members: 83 

[2161] [1] Holm L, Sander C; Medline: 96005605 DNA polymerase beta belongs to an ancient nucleotidyltransferase 
supertamily." Trends Biochem Sci 1995;20:345-347. 
so [2162] 863. (Paramyxo_P) 

Paramyxovirus P phosphoprotein 

[2163] This family consists of paramyxovirus P phosphoprotein from sendai virus and human and bovine parainflu- 
enza viruses. The P protein is an essential part of the viral RNA polymerase complex formed form the P and L prolelns 
[1]. The exact role of the P protein in this complex in unknown but it is involved in multiple protein -protein interactions 
55 and binding the polymerase complex to the nucleocapsid or ribonucleoprotein template [1]. It also appears to be im- 
portant for the proper folding of the L prololn [1], Tho pHrnmyxovlruooo hnvo o norjnttvu nonoo ooRNA (jonorno 
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55 


Number of members: 15 


[2164] 


c |ij Bowman MC, Smallwood S ( Moyor SA; Medline: 99329169 Dissection of Individual Functions of 1 the Sendai 

Virus Phosphoprotein in Transcription." J Virol 1999;73:6474-6483. 
' [2] Matsuoka Y Curran J, Pelet T, Kolakofsky D, Ray R, Compans RW; Medline: 91237868 The P gene of human 
parainfluenza virus type 1 encodes P and C proteins but not a cysteine-rich V protein/ J Virol 1 991 ;65:3406-341 0. 

to 121661 864. (Patatin) 4 . Arx0/ 

2 166 Thlo family conololo of vhiIoug patatin glycopiololno from plants. The patatin protein accounts lor up to 40 /o 
of the total soluble protein in potato tubers [2]. Patatin is a storage protein but it also has the enzymatic activity ol lipid 
acyl hydrolase, catalysing the cleavage of fatly acids 1rom membrane lipids [2]. 

is Number ol members: 21 
L2167] 

[1] Banfalvi Z, Kostyal Z, Barta E; Medline: 95107249 Solanum brevidens possesses a non-sucrose-inducible 
20 patatin gene." Mol Gen Genet 1994;245:517-522. 

|2] Mignory GA, Pikaard CS, Park WD; Medline: 88226014 Molecular characterization ol the patat.n mult.gene 
family of potato." Gene 1 988;62:27-44. 

[2168] 865. (Pentapeptide_2) 
25 Pentapeptide repeats (8 copies) one* i 

121691 These repeats are found in many mycobacterial proteins. These repeats are most common in the PPE family 
of proteins where they are found in the MPTR subfamily ol PPE proteins. The function of these repeats is unknown. 
The repeat can bo approximately described as XNXGX, where X can be any amino acid. These repeats are similar to 
Pentapeptide [1], however it is not clear it these two families are structurally related. 


Number of members: 362 
[2170] 

35 [1] Bateman A, Murzin A, Teichmann SA; Medline: 98318059 Structure and distribution ol pentapeptide repeats 

in bacteria." Protein Sci 1998;7:1477-1480. . 
[21 Cole ST Brosch R, Parkhill J, Gamier T. Churcher C. Harris D, Gordon SV, Eiglmeier K, Gas S, Barry CE 3rd, 
Tekaia F Badcock K. Basham D. Brown D. Chillingworth T. Connor R, Davies R, Devlin K, Feltwell T, Gentles S. 
Hamlin N, Holroyd S. Hornsby T. Jagels K, Barrel! BG; Medline: 98295987 Deciphering the biology of Mycobac- 

40 terium tuberculosis from the complete genome sequence." Nature 1 998,393:537-544. 

[2171] 866. (Peptidase^ 13) 

Thtef^ is known as the hemoglobinase family because it contains a globin degrading enzyme from 

45 blood parasites Swiss:P42665. However relatives are found in plants and other organisms that have other functions. 
Members of this family are asparaginyl peptidases [1]. 

Number of members: 26 

so [2172] [1] Chen JM, Dando PM, Rawlings ND. Brown MA, Young NE. Stevens RA, Hewitt E, Watts C, Barrett AJ; 
Medline: 97218252 Cloning, isolation, and characterization of mammalian legumain, an asparaginyl endopeptidase. 
J Biol Chem 1997;272:8090-8098. 
[2173] .867. (Pro_dh) 
Proline dehydrogenase 


Number of members: 25 

[2174] [1] Ling M, Allen SW, Wood JM; Medline: 95055736 Sequence analysis identifies the proline dehydrogenase 
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and delta 1- pyrroline-5-carboxylate dehydrogenase domains of the multifunctional Escherichia coli Put A protein." J 
Mol.Biol 1994;243:950-956. 
[2175] 868. (PsbP) 

[2176] This family consists of the 23 kDa subunit of oxygen evolving system of photosystem II or PsbP from various 
5 plants (wh re it is encoded by the nuclear genome) and Cyanobacteria. The 23 KDa PsbP protein is required for PSII , 
to be fully operational in vivo, it increases the affinity of the water oxidation site for CI- and provides the conditions 
required for high affinity binding of Ca2+ [2]. 

Number of members: 25 

70 

[2177] 

[1] Rova EM, Mc Ewen B, Fredriksson PO, Styring S; Medline: 97067138 Photoactivation and photoinhibition are . 
competing in a mutant of Chlamydomonas reinhardtii lacking the 23-kDa extrinsic subunit of photosystem II." J 
i$ BiolChem 1996;271:28918-28924. 

[2] Kochhar A, Khurana JP, Tyagi AK; Medline: 97191538 Nucleotide sequence of the psbP gene encoding pre- 
cursor of 23-kDa polypeptide of oxygen-evolving complex in Arabidopsis thaliana and its expression in the wild- 
type and a constitutively photomorphogenic mutant." DNA Res 1996;3:277-285. 

20 [2178] 869. (PUA) 

[2179] The PUA domain named after PseudoUridine synthase and Archaeosine transglycosylase, was detected in 
archaeal and eukaryotic pseudouridine synthases, archaeal archaeosine synthases, a family of predicted ATPases 
that may be involved in RNA modification, a family of predicted archaeal and bacterial rRNA methylases. Additionally, 
the PUA domain was detected in a family of eukaryotic proteins that also contain a domain homologous to the translation 
25 initiation factor elFl/SUIl ; these proteins may comprise a novel type of translation factors. Unexpectedly, the PUA 
domain was detected also in bacterial and yeast glutamate kinases; this is compatible with the demonstrated role of 
these enzymes in the regulation of the expression of other genes [1]. It is predicted that the PUA domain is an RNA 
binding domain. 

30 Number of members: 48 

[2180] [1] AravindL, Koonin EV; Medline: 99193178 Novel predicted RNA-binding domains associated with the trans- 
lation machinery." J Mot Evol 1999;48:291-302. 
[2181] 870. (RF1) 
55 eRF1-like proteins 

[2182] Members of this family are peptide chain release factors. The eukaryotic Release Factor 1 proteins (eRF1s) 
are involved in termination of translation. The eRF1 protein is functional for all stop codons and appears to abolish 
read-through of these codons. This family also includos olhor proioins for which the precise moloculnr function Is 
unknown. Many of them are from Archaebacterla. These proteins may also bo Involved In translation termination but 
to this awaits experimental verification.. 

Number of members: 25 

[2183] 

45 

[1] Frolova L, Le Goff X, Rasmussen HH, Cheperegin S, Drugeon G ( Kress M, Arman I, Haenni AL, Celis JE, 
Philippe M, et al; Medline: 95082951 A highly conserved eukaryotic protein family possessing properties of polypep- 
tide chain release factor" [see comments] Nature 1994;372:701-703. 

[2] Drugeon G, Jean-Jean O, Frolova L, Le Goff X, Philippe M, Kisselev L, Haenni AL; Medline: 97315314 Eukary- 
&0 otic roloaso factor 1 (oRFI) abollshos roadthrough Find compotoe with eupproseor IRNAo nt Mil throo lorminntlon 

codons in messenger RNA." Nucleic Acids Res 1997;25:2254-2258. 

[2184] 871. (Ribosomal_L14e) 
Ribosomal protein L1 4 
55 [2185] This family includes the eukaryotic ribosomal protein L1 4. 

- i 
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Number of members: 15 
[2186] 872. (Ribosoma!_S27) 

Ribooomal proloin S27a , ^ . f . . M 

[2187] This family of ribosomal proteins consists mainly of tho 40S ribosomal protein S27a wh.ch is synthesized as 
a C-terminal extension of ubiquitin (CEP). The S27a domain compromises the C-lerminal half of the protein. The 
synthesis of ribosomal proteins as extensions of ubiquitin promotes their incorporation into nascent ribosomes by a 
transient metabolic stabilization and is required lor efficient ribosome biogenesis [3]. The ribosomal extension protein 
S27a contains a basic region that is proposed to form a zinc finger; its fusion gene is proposed as a mechanism to 
:> maintain a fixed ratio between ubiquitin necessary for degrading proteins and ribosomes a source of proteins [2]. 

Number of members: 36 

[2188] 873. (Spermine„synth) . 
s Spermine/spermidine synthase . 

[2189] Spermine and spermidine are polyaminos. This family includes spermidme synthase that catalyses the fifth 
(last) step in the biosynthesis of spermidine from arginine, and spermine synthase. 


20 


45 


Number of members: 39 


[2190] 

[1] Mezquita J t Pau M, Mezquita C; Medline: 97449308 Characterization and expression of two chicken cDNAs 
encoding ubiquitin fused to ribosomal proteins ot 52 and 80 amino acids." Gene 1997;195:313-319. 
25 [2] Redman KL. Rechsteiner M; Medline: 89181932 Identification of the long ubiquitin extension as ribosomal 

protei- S27a." Nature 1989;338:438-440. , 
[3] Finley D, Bartel B, Varshavsky A; Medline: 89181925 The tails of ubiquitin precursors are ribosomal proteins 
whooo fuGion to ubiquitin lacilitatos ribosome biogenesis." Nature 1989;338:394-401. 

30 [2191] 874. (Surp)Surp module * * 

[1] Denhez F Lafyatis R; Medline: 94266805 Conservation of regulated alternative splicing and identification of func- 
tional domains in vertebrate homologs to the Drosophila splicing regulator, suppressor-of-white-apr.cot/ J Biol Chem 

1994;269:16170-16179. u "u 

[2192] This domain is also known as the SWAP domain. SWAP stands for Suppressor-of-White-APricot. It has been 

35 suggested that these domains may be RNA binding [1]. 
Number of members: 32 

[2193] 875. (TFIIE)TFIIE alpha subunit 
40 The general transcription factor TFIIE has an essential role in eukaryotic transcription initiation together with RNA 
polymerase It and other general factors. Human TFIIE consists of two subunits TFIIE-alpha Swiss:P29083 and TFIIE- 
beta Swiss P29084 and joins the preinitiation complex after RNA polymerase II and TFIIF [1]. Th.s fam.ly consists of 
the conserved amino terminal region of eukaryotic TFIIE-alpha [2] and proteins from archaebacteria that are presumed 
to be TFIIE-alpha subunits also Swiss:029501 [3]. 


Number of members: 12 


[2194] 


50 [1] Ohkuma Y, Sumimoto H, Hoffmann A, Shimasaki S, Horikoshi M, Roeder RG; Medline: 92065982 Structural 

motifs and potential sigma homologies in the large subunit o1 human general transcription lactor TFIIE. Nature 
1991;354:398-401. , 
- [2] Ohkuma Y, Hashimoto S, Roeder RG, Horikoshi M; Medline: 93087200 Identification of two large subdomams 
in TFIIE-alpha on the basis of homology between Xenopus and human sequences. Nucleic Acids Res 1992;20: 

55 5838*5838 

[31 Klenk HP Claylon RA. Tomb JF, White O, Nelson KE, Ketchum KA, Dodson RJ, Gwinn M, Hickey EK. Pelerson 
JD, Richardson DL. Kerlavage AR. Graham DE, Kyrpides NC. Fleischmann RD. Quackenbush J, Lee NH Sutton 
GG Gill S Kirkness EF, Dougherty BA. McKenney K, Adams MD, Lottus B. Venter JC. et al; Medline: 98049343 
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The complete genome sequence of the hyperthermophilic, sulphate- reducing archaeon Archaeoglobus fulgidus. 
■ Nature 1 997;390: 364-370. 

[2195] 876. (Transglut_core) 

5 [2196] Cross-reference(s) PS00547; TRANSGLUTAMINASES 

Transglutaminases (EC 2.3.2.1 3) (TGase) [1 ,2] are calcium-dependent enzymes that catalyze the cross-linking of pro- ^ 
teins by promoting the formation of isopeptide bonds between the gamma-carboxyl group of a glutamine in one polypep- 
tide chain and the epsilon-amino group of a lysine in a second polypeptide chain. TGases also catalyze the conjugation 
of polyamines to proteins. The best known transglutaminase is blood coagulation factor XIII, a plasma tetrameric protein 

io composed of two catalytic A subunits and two non-catalytic B subunits. Factor XI II is responsible for cross-linking fibrin 
chains, thus stabilizing the fibrin clot. Other forms of transglutaminases are widely distributed in various organs, tissues 
and body fluids. Sequence data is available for the following forms of TGase: 

Transglutaminase K (Tgase K), a membrane-bound enzyme found in mammalian epidermis and important for the 
is formation of the cornified cell envelope (gene TGM1). 

Tissue transglutaminase (TGase C), a monomeric ubiquitous enzyme located in the cytoplasm (gene TGM2). 
Transglutaminase 3, responsible for the later stages of cell envelope formation in the epidermis and the hair follicle 
(gene TGM3). 

Transglutaminase 4 (gene TGM4). 

20 

[2197] A conserved cysteine is known to be involved in the catalytic mechanism of TGases. The erythrocyte mem- 
brane band 4.2 protein, which probably plays an important role in regulating the shape of erythrocytes and their me- 
chanical properties, is evolutionary related to TGases. However the active site cysteine is substituted by an alanine 
and the 4.2 protein does not show TGase activity. 
2S [2198] Consensus pat1ern:[GT]-Q-[CA]-W-V-x-ISA]-[GA]-[IVT]-x(2)-T-x-[LMSC]-R-[CSA]-[LV]-G [The first C is the 
active site residue] Sequences known to belong to this class detected by the patternALL Other sequence(s) detected 
in SWISS-PROTNONE. 

[2199] [ 1] Ichinose A., Bottenus R.E., Davie E.W. J. Biol. Chem. 265:13411-1 3414(1990). [ 2] Greenberg C.S., Birck- 
bichlerP.J., Rice R.H. FASEB J. 5:3071-3077(1991). 
30 [2200] 877. (TruB_N)TruB family pseudouridylate synthase (N terminal domain) 

Members of this family are involved in modifying bases in RNA molecules. They carry out the conversion of uracil 
bases to pseudouridine. This family includes TruB, a pseudouridylate synthase that specifically converts uracil 55 to 
pseudouridine in most tRNAs. This family also includes Cbt5p that modifies rRNA [2}. 

35 Number of members: 33 

[2201] 

[1] Nurse K, Wrzesinskl J, Bakin A, Lane BG, Olengand J; Medline: 96079944 Publication, cloning, and properties 
40 of the tRNA psi 55 synthase from Escherichia colL" RNA 1995;1:102-112. 

[2] Lafontaine DLJ, Bousquet-Anlonelli C, Henry Y, Caizergues-Ferrer M, Tollervey D; Medline: 98139521 The box 
H + ACA snoRNAs carry Cbf5p, the putative rRNA pseudouridine synthase." Genes Dev 1998;12:527-537. 

[2202] 878. (UDPGP)UTP--glucose-l -phosphate uridylyltransferase 

45 This family consists of UTP-glucose-1 -phosphate uridylyltransf erases, EC:2.7.7.9. Also known as UDP-glucoso py- 
rophosphorylase (UDPGP) and Glucose-1 -phosphate uridylyltransferase. UTP--glucose-1 -phosphate uridylyltrans- 
ferase catalyses the interconversion of MgUTP + glucose-1 -phosphate and UDP-glucoso + MgPPi |1]. UDP-glucoso 
is an important intermediate in mammalian carbohydrate interconversion involved in various metabolic roles depending 
on tissue type [1]. In Dictyostelium (slime mold) mutants in this enzyme abort the development cycle [2]. Also within 

so the family is UDP-N-acetylglucosamine Swiss:Q16222 or AGX1 [3] and two hypothetical proteins from Borrelia burg- 
dorferi the lyme disease spirochaete Swiss:051893 and Swiss:051036. 

Number of members: 18 

55 [2203] 

I 

[1] Duggleby RG, Chao YC, Huang JG, Peng HL, Chang HY; Medline: 96202932 Sequence differences between 
human muscle and liver cDNAs for UDPglucose pyrophosphorylase and kinetic properties of the recombinant 
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enzymes expressed in Escherichia coli." Eur J Biochem 1996;235:173-179. 

[2] Ragheb JA, Dottin RP; Medline: 87231 075 Structure and sequence of a UDP glucose pyrophosphatase gene 
of Dkityostelium discoideunrv" Nucleic Acids Res 1987;15:3891-3906. 

|3] Mio T, Yabo T, Arisawa M, Yamada-Okabe H; Medline: 98269105 The eukaryotic UDP-N-acetylglucosamine 
i> pyiophotiphorylanoo. Gono cloning, protoin expression, and catalytic mechanism. J Biol Chem 1998;273: 

. 14392-14397. 

[2204] 879 (UPF004) Uncharacterized protein family UPF0044 signatureCross-reference(s) PS01301; UPF0044 
The following uncharacterized proteins have been shown p ] to be highlysimilar: - Bacillus subtilis hypothetical protein 
w yqel. 

- Escherichia coli hypothetical protein yhbY and HI1333, the corresponding Haemophilus influenzae protein. 
Mothanococcus jannaschii hypothetical protein MJ0652. 

is These are small proteins of 10 to 15 Kd. They can be picked up in the database by the following pattern. This pattern 
ib located in Iho N-torminal part of thoso proteins. 

12205] Consensus pattern: L-|ST]-x(3)-K-x(3).lKR]-|SGA]-x-[GA]-H.x-L.x-P-[LIV].x(2). [LIVHGA)-x(2)-G Sequenc- 
es known to belong to this class detected by the patternALL Other sequence(s) delected in SWISS-PROTNONE. 
[2206] 880. <zf-A20)A20-like zinc fingerA20- (an inhibitor of cell death)-like zinc fingers. The zincfinger mediates self- 
20 association in A20. These fingers alsomediate IL-1-induced NF-kappa B activation. 

Number of mombors: 22 

[2207] 

26 

[1] Heyninck K, Beyaert R; Medline: 99126071 The cytokine-inducible zincfinger protein A20 inhibits IL-1 -induced 
NF- kappaB activation at the level ol TRAF6. FEBS Lett 1999;442:147-150. 

|2] Do Valck D, Heyninck K, Van Criekinge W, Contreras R.Beyaert R, Fiers W; Medline: 96390831 A20, an mhib.tor 
ot cell death, sell-associates by its zinc finger domain," FEBS Lett 1996;384:61-64. 
so [3] Song HY, Rothe M, Goeddel DV; Medline: 96270609 The tumor necrosis lactor-inducible zinc finger protein 

A20 interacts with TRAF1/TRAF2 and inhibits NF-kappaB activation. Proc Natl Acad Sci U S A 1 996;93:6721 -6725. 
[4] Opipari AW Jr. Boguski MS, Dixit VM; Medline: 90368626 The A20 cDNA induced by tumor necrosis factor 
alpha encodes a novel type ot zinc finger protein." J Biol Chem 1990;265:14705-14708. 

35 [2208] 881. (zt-PARP) 

Poly(ADP-ribose) polymerase zinc finger domain 

Cross-reference(s) PS00347; PARP_ZN_FINGER_1 PS50064; PARP_ZN_FINGER_2 

[2209] Poly( ADP-ribose) polymerase (EC 2.4.2.30) (PARP) [1 .2] is a eukaryotic enzyme that catalyzes the covalent 
attachment ot ADP-ribose units from NAD(+) to various nuclear acceptor proteins. This post-translational modification 

40 of nuclear proteins is dependent on DNA. It appears to be involved in the regulation of various important cellular proc- 
esses such as differentiation, proliferation and tumor transformation as well as in the regulation of the molecular events 
involved in the recovery of the cell from DNA damage. Structurally. PARP, about 1000 amino-acids residues long, 
consists of three distinct domains: an N-terminal zinc-dependent DNA-binding domain, a central automodif.cat.on do- 
main and a C-terminal NAD-binding domain. The DNA-binding region contains a pair of zinc finger domains which 

4s hnvo boon shown to bind DNA in a zinc-dopondenl manner. The zinc finger domains of PARP seem to bind specifically 
to single-stranded DNA. DNA ligase III [3] contains, in its N-torminal section, a single copy of a zinc finger highly similar 

to those of PARP. r _ ... u n 

[2210] Consensuspattern:C-[KR]-x-C-x(3)-l-x-K-x(3)-[RG]-x(l6,l8)-W-[FYH]-H-x(2)-C[ThethreeCsandtheHare 

zinc ligands] Sequences known to belong to this class detected by the patternALL. Other sequence(s) detected .n 
so SWISS-PROTNONE. Sequences known to belong to this class delected by the profile ALL. Other sequence(s) detected 
in SWISS-PROTNONE. . 
[2211] Note: This documentation entry is linked to both signature patterns and a profile. As the profile is much more 
sensitive than the patterns, you should use it if you have access to the necessary software tools to do so. 

55 [ 1] Althaus F.R., Richter C.R. Mol. Biol. Biochem. Biophys. 37:1-126(1987). 

I 2] de Murcia G., Menissier de Murcia J. Trends Biochem. Sci. 19:172-176(1994). 
. [ 3] Wei Y.-F., Robins P.. Carter K.. Caldecott K.. Pappin D.J.C.. Yu G.-L. Wang R.-P. Shell B.K.. Nash R.A., Schar 
P., Barnes D.E., Haseltine W.A., Lindahl T.~Mol. Cell. Biol. 15:3206-3216(1995). 
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A. Asparaginase 2 

[2212] Asparaginase II (L-asparagine aminohydrolase II) is an extracellular protein that may be associated with the 
cell wall and whose expression is affected by the availability of nitrogen. Asparaginase II catalyzes the roaction of L- 
5 Aeparagino i H 2 0 = L-Aspartato i NH 3 . As many loukomias havo high roquiiomonte lor tisparllc acid, ficpftmgimiso • 
II proteins are useful as reagents for screening compounds for activity as leukemia chemotherapy products. Aspara- - 
ginase II protein can also be over- or under-expressed to alter amino acid content in plant tissues or to modify nitrogen 
fixation and/or nitrogen metabolism in plants. 

[2213] Rel: Bon et al. (1997) Appl Biochem Biotechnol 63-65: 203-12 

70 

B, Chloroa b-bind 

[2214] Chlorophyll a-b binding proteins are located in the thylakoid membranes of the chloroplast and bind chlorophyll 
a and chlorophyll b, thereby triggering a chemical reaction (photosynthesis). These proteins are useful in controlling 
is the rate, efficiency and/or output of photosynthesis. Overexpression of chlorophyll a-b binding proteins is expected to 
increase the rate of photosynthesis. 

Ref: Leutwiler et al: (1986) Nucleic Acids Res 14: 4051-64 Brandt et al. (1992) Plant Mol Biol 19: 699-703 
20 C. DMRL synthase 

[2215] DMRL Synthase (6,7-Dimethyl-S-Ribityllumazine Synthase) catalyzes the last step in riboflavin (Vitamin B 2 ) 
synthesis, condensing 5-amino-6-(r-D)-ribityl-amino-2,4(1H, 3H)-Pyrimidinedione with L-3,4-Dihydroxy-2-Butanone 
4-Phosphate producing 6,7-Dimethyl-8-(1-D-Ribityl)Luminazine. The enzyme forms a homopentamer. Engineering of 
25 these proteins or those with homologous sequences/structures may allow control of the amounts of vitamin Bg available 
in plants and/or accumulation of pigment, as well as altering reactions requiring hydrogen ion carriersAransmitters. 
Ref: Garcia-Ramirez et al. (1995) J Biol Chem 270: 23801-7 

D. E1_N 

30 

[2216] These proteins are ATP-dependent DNA helicases that are required for initiation of viral DNA replication. They 
form a complex with the viral E2 protein. The E1-E2 complex binds to the replication origin that contains binding sites 
for both proteins. The majority of sequences known for this group of proteins are from various papillomaviruses, a type 
of double stranded DNA virus. In plants, the prototype double stranded DNA virus is Cauliflower Mosaic virus (CaMV). 
55 Manipulation of these proteins, especially to produce variant proteins that form non-productive complexes, enables 
production of plants that are resistant to infection by double stranded DNA viruses. 

Ref: Yang et,ai; (1993) PNAS USA 90: 5086-90 

* Ustav and Stenlund (1991) EMBO J 10: 449-57 
40 Callaway et al. (1996) Mol Plant Microbe Interact 9: 810-8 

E. EF1_G 

[2217] Elongation Factor-1 is composed of four subunits; alpha, beta, delta and gamma. Gamma subunits wo pro- 
45 sumed to play a role in anchoring the complex to other cellular components. Studies of EF-1 genes in plants suggests 
that different forms of the EF-1 subunits may be expressed in particular organs or in response to stress. Manipulation 
of the activity of these proteins, either by altered expression level or by structural mutation, may result in the accumu- 
lation of a particular protein in a chosen organ or allow production of particular proteins during stress conditions. 

BO Ref; Kinzy et al. (1994) NAR 22: 2703-7 Dunn et al. (1993) Plant Mol Biol 23: 221-5 Agullar et al. (1991) Plant Mol 
Biol 17: 351-60 

F. ENV polyprotein 

55 [2218] This family comprises the envelope or coat proteins known from a number of different retroviruses. In mam- 
malian spocloo, rotrovlrusoo mo rooponolblo lor dlootmou auch im laukumUi find HIV In plnnta, luliovliubtm mio knbwn 
in both monocot (e.g. Zeon-1 ) and dicot (e.g. Arabidopsis and tobacco) species and have been shown to induce mutant 
alleles at new loci. Engineering of plant ENV proteins may allow mobilization or targeting of endogenous or Introduced 
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retroviruses, in essence generating a new method for mutant production, gene tagging and the like. 

Ret: Mamoun et al (1 990) J Virol 64: 4180-8 Grandbastien et al. (1 989) Nature 337: 376-80 Wright and Voytas (1 998). 
Gonotics 149: 703-15 

s 

G . -Glycosyl hvdr9 

[2219] Proteins having this domain (previously known as the glycosyl hydrolase family 5 domain) catalyze the en- 
dohydrolysis of 1 .4-p-D glucosidic linkages in cellulose. Numerous plant proteins with this domain exist and are ex- 
to pressed^ an organ specific manner. They are involved in the fruit ripening process, in cell elongation and plant re- 
production Modulation of the activity of these proteins, either by over- or under-express.on or by mutaUon of the 
polypeptide, could be used to affect post-harvest physiology (e.g. rate of ripening) or for eng.neer.ng reproduce 
sterility. 

is Ref Giorda et al. (1990) Biochemistry 29: 7264-9 Tucker et al. (1 986) Plant Physiol 88:1257-62 Shani et al. (1997) 
40; 037-42 Milliflfin and Gfiooor (1995) Plant Mol Biol 28: 691-711 

H. Glycosyl hvdr14 

20 ,2220] The p-amylases (family 14 of glycosyl hydrolases) catalyze the hydrolysis of ^^^^lll 
polysaccharides and remove successive maltose units from the non-reducing ends ol the cha.ns. Mutants of p-amylase 
E^o^ta" hiblled altered degradation of starch throughout the diurnal cycle. In addition, the mutant pM>P« 
indicated that these enzymes not only aflect carbohydrate metabolism/catabolism, but a.so .nfluence the amount ol 
pigment stored within particular cells. Manipulation of the p-amylase genes enables control of plant p,gmentat.on (for 

25 example, fibre pigment in cotton) as well as carbohydrate synthesis and degradation. 

Rol: Zooman ol al. (1998) Plant J 15: 357-65 Hirano and Nakamura (1997) Plant Physiol 114: 5675-82 Kitamoto et 
al. (1988) J Bactoriol 170: 5848-54 

30 I. Glycosyl hydr15 

[2221] Glycosyl hydrolases from family 15 (such as 1 ,4-Alpha-D-Glucan glucohydrolase.) catalyze the hydrolysis of 
terminal 1 4 linked alpha-D-glucose residues successively from the non-reducing ends of the cha.ns resulting .n he 
oZ!e ol p-D Slucoeo. In plan.s those proteins have boon «iod to the mobilization of the xyloglucan stored in the 
35 cotyledonary cell walls. Proteins such as these could be varied to aflect the rate of plant growth (for example during 
germination), storage and/or use of glucose and other sugars by plant tissues and alteration of the propert.es, such 
as elasticity, of plant cell walls. 


40 


Rel. Crombie et al. (1998) Plant J 15: 27-38 Hata et al. (1991) Agric Biol Chem 55: 941-9 
J. Glycosyl hydr20 


[22221 Members of the family 20 glycosyl hydrolases catalyze the hydrolysis of terminal non-reducing N-acet*-l> 
hexosamine residues in N-acetyl-p-D-hexosaminides. N-acetyl-p-glucosaminidase belongs to this family and exists in 

« sovTa, dSerent lorms (consisting o, various combinations of alpha and beta chains) depend* S , on the < xganisn. 
Family 20 glycosyl hydrolases have been implicated in lysosomal storage diseases (such as Sandhoff disease and 
glycogen storage disease in humans. These types of proteins are also responsible for the hydrolysis o ch.tm. In plants, 
these proteins could be useful in controlling carbohydrate catabolism, thereby influencing the amount of sugars avart- 
able lor storage and/or use in other metabolic pathways. In addition, it is possible that such proteins could be used to 

so engineer an endogenous insect protection mechanism, e.g. by secretion of a chitin-hydrolyz.ng composition by the 
plant. 

Ret: Graham et al (1 988) J Biol Chem 263: 16823-9 O'Dowd et al. (1988) Biochemistry 27: 5216-26 
55 K. HMG box 

f 22231 The HMG box is a novel type of DNA-binding domain found in a diverse group ol proteins. Numerous plant 
proteins contain this domain, such as the HMG1/2-like proteins. The expression of some of these HMG proteins appears 


319 


NSDOCID: <EP 10334 05 A2_L> 


EP 1 033 405 A2 


to be regulated by circadian rhythms and in a light dependent manner, occurring at higher levels in roots, for example 
and lower levels in light-grown tissues such as cotyledons. Generally, HMG proteins are thought to influence transcrip- 
tion regulation. In plants, HMGs are believed to have a role in maintaining patterns of circadian-regulaled expression 
for other genes, suggesting that these proteins could bo exploited to control growth and dovolopmont. 

s 

Ref: Laudet et al. (1993) Nucleic Acids Res 21: 2493-501 Zheng et al. (1993) Plant Mol Biol 23: 813-23 Grasser et - 
al. (1993) Plant Mol Biol 23: 619-25 

L. IL2 

10 

[2224] lnterleukin-2 (IL-2)is produced in mammals by T cells in response to antigenic or mitogenic stimulation and 
is crucial for proper regulation and f unctioning of the immune response. I L-2 is capable of stimulating B cells, monocytes, 
lymphokine-activated killer cells, natural killer cells and glioma cells. Plant extracts have also been shown to stimulate - 
the immune system (for example, mistletoe therapy for human cancer). It is known that IL-2 is involved in feedback 
*s inhibition pathways that impact the inflammatory response as well as the growth inhibition of tumor reactive T colls. 
Plant proteins containing IL-2-like sequences are useful as immunity-based therapeutics, acting in a manner similar 
to IL-2 in mammals. 

Ref: Heike et al. (1997) Scand J Immunol 45: 221-6 Ariel et al. (1998) J Immunol 161: 2465-72 Schink (1997) An- 
20 ticancer Drugs 8 Suppl 1 : S47-51 

M. Oxidored FMN 

[2225] NADPH dehydrogenases catalyze the reaction NADPH + acceptor = NADP(+) + reduced acceptor. One mem- 
25 ber of this family is yeast old yellow enzyme - (OYE) and is thought to be involved in oxylipin metabolism. A second 
yeast family member is a protein that binds estrogen binding protein (EBP) in addition to exhibiting oxidoreductase 
activity. An Arabidopsis homolog to OYE has been described and estrogen binding proteins in plants have been ro- 
ported. Plant proteins from this class have the potential to be used to modify lipid metabolism/catabolism. These pro- 
teins may also have use as therapeutics for breast and prostate cancer, and other abnormal growth in steroid-sensitive 
30 tissues. 

Rof: Bakor ot al. (1998) Proc Soc Exp Biol Mod 217: 317-21 Schallor nnd Wollor (1997) J Biol Chom 272: 28066-72 
Mandanl ot al. (1994) PNAS USA 91: 922-6 

35 N. Oxidored _q2 

[2226] The NADH-plastoquinone oxidoreductases catalyze the reaction NADH + plastoquinone = NAD(+) + plasto- 
quinol. In plants these reactions occur in the chloroplast and are believed to participate in a chloroplast respiratory 
system. Here, the NDH complex is postulated to act as a valve to remove excess reduction equivalents in the chloro- 
40 plasts. Manipulation of these proteins may improve the rate or efficiency of photosynthesis. 

Ref: Burrows et al. (1998) EMBO J 17: 868-76 Kofer et al (1998) Mol Gen Genet 258: 166-73 Maier et al. (1995) J 
Mol Biol 251: 614-28 

45 Q. PABP 

[2227] Polyadonylate binding proteins bind the poly (A) tail of mRNA. Plants, as exemplified by Arabidopsis, contain 
numerous PABP genes that are expressed in an organ-specific manner. For example, PABP2 is functional in roots and 
shoots, while PABP5 is expressed predominantly in immature flowers. The PABP proteins are implicated in numerous 
5£> aspects of posttranscriptional regulation including mRNA turnover and translational initiation. Control of activity of PABP 
proteins provides the ability to control the expression of various genes in particular organs during development. 

Ref: Hilson et al (1993) Plant Physiol 103: 525-33 Belostotsky and Meagher (1993) PNAS USA 90: 6686-90 

55 P. Parvo coat 

i 

[2228] Parvoviruses are linear single-stranded DNA viruses that aro encapsulated by three capsid proteins, Plants 
are susceptible to Infection by single stranded DNA viruses such as Maize streak virus (MSV) and various Gemini 
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viruses. The coat proteins in these plant viruses are critical to the virus life cycle within the plant. For exampl , the coat 
protein of MSV is thought to be involved in intra- and inter-cellular movement within the plant. Engineering of proteins 
having similarity to parvoviral coat proteins, especially to produce proteins that interfere with maturation of the virus 
pnMir.lo, onnbloo Iho production of plants having bettor rosistnnco to natural plant singlo-strandod DNA viruses. 

Ref: Liu et al. (1997) J Gen Virol 78: 1265-70 Rohde et at. (1990) Virology 176: 648-51 
Q. Pkinase C 

io [2229] Plfinl norino/throonino proloin kinasos possessing this domain aro oxprossod in all tissues and are known to 
undorgo sorino-spocilic aulophosphorylation and spocilically phosphorylato two ribosomal proteins, P1 4 and P1 6. Dur- 
ing development, these proteins predominate during high metabolic activity in growing buds, root tips, leaf margins 
and germinating seeds. They are thought to be involved in the control of plant growth and development. In addition, 
two genes encoding proteins from this family have been described that help plant cells adapt during cold or high salt 

is stresses. Consequently, engineering Pkinase C proteins provides a way to control general growth/development of the 
plant no woll mo a moans to provide endogenous proloction against environmental stresses. 

Rel: Zhang et al. (1994) J Biol Chem 269: 17586-92 Mizoguchi et al. (1995) FEBS Lett 358: 199-204 
20 R. REV 

[2230] The REV protoins act posMranscriptionally to relieve negative repression of GAG and ENV production in 
retroviruses such as Human Immounodeliciency Virus type I (HIV-1). Plants contain retrovirus-like viruses such as 
pararetroviruses and retrotransposons (i.e. transposons having long terminal repeats). Plant retrotransposons in par- 
25 ticular have been used to create mutations at various loci, thereby permitting gene isolation, gene tagging and the like. 
Manipulation of plant REV proteins enables control of transposition frequencies of corresponding transposable ele- 
ments and provides a new tool for genetic engineering of plants. 

Ref Sodroski et al. (1986) Nature 321: 412-7 Franchini et al. (1989) PNAS USA 86: 2433-7 Marquet et al. (1995) 
so 77: 113-24 Grandbastien et al. (1989) Nature 337: 376-80 Wright and Voytas (1998) Genetics 149: 703-15 

S. RuBisCo small 

[2231] Ribuloso 1 ,5-bisphosphate carboxylaso/oxygenaso (RuBisCo) catalyzes the initial step in the C3 photosyn- 
35 thetic carbon reduction cycle, adding carbon dioxide to D-ribulose 1 ,5-bisphosphate to form two molecules of 3-phos- 
pho-D-glycerate. RuBisCo is comprised of two subunits, one large which is synthesized in the chloroplast, and one 
small which is synthesized in the cytoplasm and then transported in to the chloroplast. The expression of the small 
subunit of RuBisCo is light regulated. Manipulation of these proteins could increase the efficiency of photosynthesis 
or allow alterations in developmental timing. 


40 


Ref: Giuliano el a I. (1988) PNAS USA 85: 7089-93 Dedonder el al. (1993) Plant Physiol 101: 801-8 


T. Sialyltransf 

as [2232] Members of the CMP-N-acetylneuraminate-p-galactosamide-a-2,3-sialyltransferase family catalyze the fol- 
lowing reaction: 

CMP-N-acetylneuraminate + p-D-galactosyl-1 ,3-N-acetyl-a-D-galactosaminyl-R = CMP + a-N-acetylnerammyl-2.3-|5- 
D-galaclosyl-1,3-N-acetyl-alpha-D-galactosaminyl-R. These proteins are though to be responsible for the synthesis of 
the sequence neurac-a-2,3-gal-p-1 ,3-galnac- found on sugar chains )-linked to threonine or serine and also as a ter- 
se minal sequence on certain gangliosides in mammalian cells. In plants, glycosyltransferases in the Golgi apparatus 
synthesize cell wall polysaccharides and elaborate the complex glycans of glycoproteins. Engineering of plant sialyl- 
transferases allows targeting of proteins to particular cellular locations or enables the making of changes in cell wall 
structure. 

55 Ref: Wee et al. (1998) Plant Cell 10: 1759-68 Lee et al. (1994) J Biol Chem 269: 10028-33 Kitagawa and Paulson 
(1994) J Biol Chem 269: 1394-401 
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U. Signal 

[2233] Many plant proteins in this family contain sequences similar to those found in both components of the prokary- 
otic family of signal transducers known as the two-component systems. This suggests that activation may require a 
transfer of a phosphate group between the transmitter domain and the receiver domain. One family member in Arabi- 
dopsis appears to be involved in ethylene (a plant hormone) signal transduction. Other proteins in this family appear 
to be involved in the regulation of gene transcription under conditions of environmental stress. Signal proteins can be 
exploited to affect plant growth and development and/or control plant responses to stress conditions such as cold, 
nutrient availability, etc. 

Ref: Chang et al. (1993) Science 262: 539-44 Nagaya et al. (1993) Gene 131: 119-124 Gottfert et al. (1990) PNAS 
USA 87: 2680-4 

V. VMS A 

[2234] vMSA proteins are major surface antigens presenting on the envelope of various retroviruses. Surface anti- 
gens of retroviruses are often involved in tropism of the virus. Plants contain retrovirus-like viruses such as pararelro- 
viruses and retrotransposons (i.e. transposons having long terminal repeals). Plant retrotransposons in particular have 
been used to create mutants at various loci, thereby permitting gene isolation, gene tagging and the like. Manipulation 
of plant vMS A proteins enables control of tropism of plant retroviruses that might be used for genetic engineering tools, 
thus enabling targeting of the virus to particular species and/or tissues of plants. 

Rel: Okamoto et al. (1 988) J Gen Virol 69: 2575-83 Grandbastten et al. (1 989) Nature 337: 376-80 Wright and Voytas 
(1998) Genetics 149: 703-15 

W. zf-CCCH 

[2235] This family of proteins is defined by having two CX(8)CX(5)CX(3)H-type zinc finger domains. These proteins 
cover a broad range of functions. For example, the COP1 protein acts as a repressor of photomorphogenesis in dark- 
ness; light stimuli abolish this suppressive action. In addition, COP1 protein can function as a negative transcriptional 
regulator capable of direct interaction with components of the G-protein signaling pathway. As a second example, a 
zf-CCCH protein identified in Arabidopsis appears to be involved in the resistance to DNA damage induced by UV light 
and chemical DNA-damaging agents. Overexpression of this class of proteins permits production of plants that are 
better suited to adverse environments. Manipulation of expression of zf-CCCH proteins functioning as transcriptional 
regulators, such as COP1, enables manipulation of some signal transduction pathways. 

Ref: Pang et al. (1993) Nucleic Acids Res 21: 1647-53 Deng et al. (1992) Cell 71: 791-801 

X. zf-RanBP 

[2236] Proteins falling within this category contain many X-X-F-G and X-F-X-F-G repeats, and may contain 
RANBP1-like or PPIase domains. Plant proteins having domains similar to these include PAS1 and GMSTI. PAS1 has 
been shown to have dramatic developmental affects that appear to be correlated with both cell division and cell wall 
elongation. GMSTI has high identity to the yeast STI stress-inducible gene and has been shown to be heat inducible. 
Proteins such as these may be useful lor controlling growth and form of development. 

Ref: Vittorioso et al. (1998) Mol Cell Biol 18: 3034-43 Hernandez Torres et al. (1995) 27: 1221-6 

Y. Peptidase M48. 

[2237] Proteins belonging to this poptidaso family are rnotalloprotoasos that bind zinc as a cofactor and aro focfitod 
in the membranes of the endoplasmic reticulum. They function in NH 2 -terminal proteolytic processing, as shown for 
the yeast STE24 gene product. This gene is required for the correct processing of a-factor, a yeast pheromone. Family 
M48 peptidases also appear to be required for some prenylation reactions, mediating COOH-terminal CAAX process- 
ing. Prenylation reactions are believed to be involved in the regulation of protein-protein and protein-membrane inler- 
«etlone. Ae tin example, RA8 GTHtme activity is ragulatod in pad by louali/fitiuirlu the inner biUa ot the plabma Mem- 
brane upon prenylation. In plant6, proteins from this family could bo involved In pollen-stigma interactions such as 
those mediating self-pollenatlon vs. outcrossing, or could be members of several secondary metabolism pathways. 
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plants, these proteins are located in the thylakoid membranes of the chloroplasts, their expression is light regulated 
and they are thought to be involved in degradation of solubl stromal proteins and turn-over of thylkoid proteins [1J. 
Manipulation of expression and structure of these proteins would have effects on the efficiency of photosynthesis and 
the development of chloroplasts. 
5 [2246] Refs 

1 Lindahl M, Tabak s ( Cseke L, Pichersky E, Andersson B, Adam 2 (1996) J Biol Chem 271; 29329-34. 
Ae. UPF0051 

io [2247] There is some evidence that, in plants, proteins in this family are involved in ATP synthesis in chloroplasts 
[1, 2]. Mutations in these proteins or altering their expression would affect the efficiency of photosynthesis and energy 
production. 
[2248] Refs 

ib 1 Kostrzewa M, Zetsche K (1992) J Mol Biol 227: 961-70. 

2 Kostrzewa M, Zetsche K (1 993) Plant Mol Biol 23: 67-76 

Af. E7 

20 [2249] Papillomaviruses are encapsulated double stranded DNA viruses. The Papillomavirus early protein 7 (E7) is 
known as a potent immortalizing and transforming agent. Transformation by E7 is thought to be mediated by the physical 
association of E7 with cellular proteins regulating entry into the cell cycle [1]. The result is entry into the cell cycle and 
suppression of terminal differentiation in mammalian cells. Thus, engineering of proteins having similarity to papillo- 
mavirus E7 protein enables the production of plants having altered cellular proliferation characteristics and possibly 

2S altered morphology. For example, overexpression of E7-like proteins would be expected to result in proliferation of 
cells of the tissue in which the E7 protein is expressed, perhaps with suppression of differentiation events. Thus, for 
example, overexpression of E7-like proteins in meristem cells can result in taller plants and suppression of leafing and/ 
or flowering. 
[2250] Refs 

30 i Zwerschke W, Jansen-Durr P Adv Cancer Res 2000;78:1-29 
Ag. Peptidase U7 

[2251] This protein is known to be an integral membrane protein in the cyanobacterium Synechocystis where it 
35 functions to digest cleaved signal peptides [1 ]. This activity is necessary to maintain proper secretion of mature proteins 
across the membrane. In higher plants this protein may be present in the plastid or chloroplast membranes where it 
would function by enabling protein movement into and out of the chloroplasts. Mutations in this protein would be ex- 
pected to affect the development of plastids, including chloroplasts, or alter the energy transfor system within the 
chloroplasts, thereby affecting growth and development. 
40 [2252] Refs 

1 Kaneko T, Sato S, Kotani H, Tanaka A, Asamizu E, Nakamura Y, Miyajima N, Hirosawa M, Sugiura M, Sasamoto S, 
Kimura T, Hosouchi T, Matsuno A, Muraki A, Nakazaki N, Naruo K, Okumura S, Shimpo S, Takeuchi C, Wada T, 
Watanabe A, Yamada M, Yasuda M, Tabata S (1996) DNA Res 3:109-36. 

45 A. Activities of Polypeptides Comprising Signal Peptides 

[2253] Polypeptides comprising signal peptides are a family of proteins that are typically targeted to (1) a particular 
organelle or intracellular compartment, (2) interact with a particular molecule or (3) for secretion outside of a host cell. 
Example of polypeptides comprising signal peptides include, without limitation, secreted proteins, soluble proteins, 
so receptors, proteins retained in the ER, etc. 

[2254] These proteins comprising signal peptides are useful to modulate ligand-receptor interactions, celt-to-cell 
communication, signal transduction, intracellular communication, and activities and/or chemical cascades that take 
part in an organism outside or within of any particular cell. 

[2255] One class of such proteins are soluble proteins which are transported out of the cell. These proteins can act 
55 as ligands that bind to receptor to trigger signal transduction or to permit communication between cells. 

[2256] Another class is receptor protolns which hIbo compiled w rolonllon doirniln Ihtil lodyoo Iho rocoploi protolji In 
the membrane when the cell transports the receptor to the surface of the cell, Like the soluble ligands, receptors can 
also modulate signal transduction and communication between cells. 
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[2257] In addition the signal peptide itself can serve as a ligand for some receptors. An example is the interaction of 
tho ER targeting signal peptide with the signal recognition particle (SRP). Here, the SRP binds to the signal peptide, 
halting translation, and the resulting SRP complex then binds to docking proteins located on the surface of the ER, 
prompting Irnnotor ol tho protoin into tho ER. 
£> [2268] A doccriplion ot signal poptido rosiduo composition is doscribod bolow in Subsoction IVC.1 . 

III. Methods of Modulating Polypeptide Production 

[2259] It is contemplated that polynucleotides of the invention can be incorporated into a host cell or in-vitro system 
w to modulato polypeptide production. For instance, the SDFs prepared as described herein can be used to prepare 
expression cassettes useful in a number ot techniques tor suppressing or enhancing expression. 
[2260] An example are polynucleotides comprising sequences to be transcribed, such as coding sequences, of the 
prosonl invention can be inserted into nucleic acid constructs to modulate polypeptide production. Typically, such se- 
quur icus to bo imnGCi ibod tuo homologous to fit loasl ono olomonl of Iho nucloic acid construct to generate a chimeric 
is gene or construct. 

[2261] Another oxamplo of usoful polynucleotides are nucleic acid molecules comprising regulatory sequences ot 
the present invention. Chimeric genes or constructs can be generated when the regulatory sequences of the invention 
linked to heterologous sequences in a vector construct. Within the scope ot invention are such chimeric gene and/or 
concliuclG. 

20 [2262] Also within the scope of the invention are nucleic acid molecules, whereof at least a part or fragment of these 
DNA molecules are presented in REF AND SEQ TABLES 1 AND 2 of the present application, and wherein the coding 
Boquonco is under tho control of its own promoter and/or its own regulatory elements. Such molecules are useful for 
transtorming the genome ol a host cell or an organism regenerated from said host celt tor modulating polypeptide 
production. 

25 [2263] Additionally, a vector capable of producing the oligonucleotide can be inserted into the host cell to deliver the 

oligonucleotide. . 

[2264] More detailed description of components to be included in vector constructs are described both above and 

bolow, ' 
[2265] Whether the chimeric vectors or native nucleic acids are utilized, such polynucleotides can be incorporated 
so into a host cell to modulate polypeptide production. Native genes and/or nucleic acid molecules can be effective when 
exogenous to tho host cell. 

[2266] Methods of modulating polypeptide expression includes, without limitation: 
Suppression methods, such as 

35 Antisense 
Ribozymes 
Co-suppression 

Insertion of Sequences into the Gene to be Modulated 
Regulatory Sequence Modulation. 


40 
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as well as Methods for Enhancing Production, such as 

Insertion of Exogenous Sequences; and 
Regulatory Sequence Modulation. 


III. A. Suppression 

[2267] Expression cassettes of the invention can be used to suppress expression of endogenous genes which com- 
prise the SDF sequence. Inhibiting expression can be useful, for instance, to tailor the ripening characteristics of a fruit 
so (Oeller et al., Science 254:437 (1991)) or to influence seed size (WO98/07842) or to provoke cell ablation (Manani et 
fil., Naluro 357: 384-387 (1992). 

[2268] As described above, a number of methods can be used to inhibit gene expression in plants, such as antisense, 
ribozyme, introduction of exogenous genes into a host cell, insertion of a polynucleotide sequence into the coding 
sequence and/or the promoter of the endogenous gene of interest, and the like. 

55 

III. A.I . Antisense 

[2269] An expression cassette as described above can be transformed into host cell or plant to produce an antisense 
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strand of RNA. For plant cells, antisense RNA inhibits gene expression by preventing the accumulation of mRNA which 
encodes the enzyme of interest, see, e.g., Sheehy et al., Proc. Nat. Acad, Sci. USA, 65:8805 (1988), and Hiatt et al., 
U.S. Patent No. 4,801,340. 

5 lll.A.2. Ribozymes 

[2270] Similarly, ribozyme constructs can be transformed into a plant to cleave mRNA and down-regulate translation. 
III.A.3. Co-Suppression 

10 

[2271] Another method of suppression is by introducing an exogenous copy of the gene to be suppressed. Introduc- 
tion of expression cassettes in which a nucleic acid is configured in the sense orientation with respect to the promoter 
has been shown to prevent the accumulation of mRNA. A detailed description of this method is described above. 

is III.A.4, Insertion of Sequences into the Gene to be Modulated 

[2272] Yet another means of suppressing gene expression is to insert a polynucleotide into the gene of interest to 
disrupt transcription or translation of the gene. 

[2273] Homologous recombination could be used to target a polynucleotide insert to a gene using the Cre-Lox system 
20 (A.C. Vergunst et al., Nucleic Acids Res. 26:2729 (1998), A.C. Vergunst et al., Plant Mot Biol. 38:393 (1998), H. Albert 
et al., Plant J. 7:649 (1995)). 

[2274] In addition, random insertion of polynucleotides into a host cell genome can also be used to disrupt the gene 
of interest. Azpiroz-Leehan et al., Trends in Genetics T3:1 52 (1997). In this method, screening for clones from a library 
containing random insertions is preferred for identifying those that have polynucleotides inserted into the gene of in- 
2$ terest. Such screening can be performed using probes and/or primers described above based on sequences from REF 
AND SEQ TABLES 1 AND 2, fragments thereof, and substantially similar sequence thereto. The screening can also 
be performed by selecting clones or any transgenic plants having a desired phonotypo. 

III.A.5. Regulatory SequenceModulation 

30 

[2275] The SDFs described in REF and SEQ TABLES 1 and 2, and fragments thereof are examples of nucleotides 
of the invention that contain regulatory sequences that can be used to suppress or inactivate transcription and/or 
translation from a gene of interest as discussed in I.C.5. 

3S lll.A.6. Genes Comprising Dominant-Negative Mutations 

[2276] When suppression of production of the endogenous, native protein is desired it is often helpful to express a 
gene comprising a dominant negative mutation. Production of protein variants produced 1rom genes comprising dom- 
inant negative mutations is a useful tool for research Genes comprising dominant negative mutations can produce a 

40 variant polypeptide which is capable of competing with the native polypeptide, but which does not produce the native 
result. Consequently, over expression of genes comprising these mutations can titrate out an undesired activity of the 
native protein. For example, The product from a gene comprising a dominant negative mutation of a receptor can be 
used to constitutively activate or suppress a signal transduction cascade, allowing examination of the phenotype and 
thus the trait(s) controlled by that receptor and pathway. Alternatively, the protein arising from the gone comprising a 

45 dominant-negative mutation can be an inactive enzyme still capable of binding to the same substrate as the native 
protein and therefore competes with such native protein. 

[2277] Products from gonos comprising dominant-nogativo mutations can also act upon tho nativo protein ilsolf to 
prevent activity. For example, the native protein may be active only as a homo-multimer or as one subunit of a hetero- 
multimer. Incorporation of an inactive subunit into the multtmer with native subunit(s) can inhibit activity. 
so [2278] Thus, gene function can be modulated in host cells of interest by insertion into these cells vector constructs 
comprising a gene comprising a dominant-negative mutation. 

IILB. Enhanced Expression 

55 [2279] Enhanced expression of a gene of interest in a host cell can b accomplished by either (1) insertion of an 
exogenous gone; or (2) promotor modulation. I 


326 


EP 1 033 405 A2 


III. B.1. Insertion of an Exogenous Gene 

[2280] jtnsertion of an expression construct encoding an exogenous gene can boost: the number of gene copies 
expressed in a host cell. • . . . 

5 [2281], Such expression constructs can comprise genes that either encode the native protein that is of interest or 
that encode a variant that exhibits enhanced activity as compared to the native protein. Such genes encoding proteins 
of interest can be constructed from the sequences Irom REF AND SEQ TABLES 1 AND 2, fragments thereof, and 
substantially similar sequence thereto. . 

[2282] Such an exogenous gene can include either a constitutive promoter permitting expression in any cell in a host 
to organism or a promolor that directs transcription only in particular cells or times during a host cell life cycle or in response 
to onvlionmtinu.il utimuli. 

ill.B.2. Regulatory Sequence Modulation 

75 [2283] The SDFs of REF and SEQ TABLES 1 AND 2, and fragments thereof, contain regulatory sequences that can 
bn imod lo nnhfinco oxproonion of n flono of intorost. For example, somo of thoso sequences contain useful enhancer 
olomonto. In oorno cjigog, duplication of onhancor olomonlo or insertion ol oxogonous onhoncor olomonts will increase 
expression of a desired gene from a particular promoter. As other examples, all 11 promoters require binding of a 
regulatory protein to be activated, while some promoters may need a protein that signals a promoter binding protein 

20 ■ to expose a polymerase binding site. In either case, over-production of such proteins can be used to enhance expres- 
sion of a gene of interest by increasing the activation time ol the promoter. 

[2284] ' Such logiilnlory proteins fire encoded by somo of the soquoncos in REF AND SEQ TABLES 1 AND 2, frag- 
ments Ihoroof, and subsltjnlially simitar sequences thorelo. 

[2285] Coding sequences for these proteins can be constructed as described above. 

25 

IV. Gene Constructs and Vector Construction 

[2286] To use isolated SDFs of the present invention or a combination of them or parts and/or mutants and/or fusions 
of said SDFs in the above techniques, recombinant DNA vectors which comprise said SDFs and are suitable for trans- 
30 formation of cells, such as plant cells, are usually prepared. The SDF construct can be made using standard recom- 
binant DNA techniques (Sambrook et al. 1989) and can be introduced to the species of interest by Agrobacteriurrh 
mediated transformation or by other means of transformation (e. g. , particle gun bombardment) as referenced below. 
[2287] The vector backbone can be any of those typical in the art such as plasmids, viruses, artificial chromosomes, 
BACs, YACs and PACs and vectors of the sort described by 

36 

(a) BAC: Shizuya et al. : Proc. Natl. Acad. Sci. USA 89: 8794-8797 (1992); Hamilton et al., Proc. Natl. Acad. Sci. 
USA 93: 9975-9979 (1996); 

(b) YAC: Burke et al., Science 236:806-812 (1987);. 

(c) PAC: Sternberg N. el al., Proc Natl Acad Sci USA. Jan;87(1): 103-7 (1990); 

40 (d) Bacteria- Yeast Shuttle Vectors: Bradshaw et al., Nucl Acids Res 23: 4850-4856 (1995); 

(e) Lambda Phage Vectors: Replacement Vector, e.g., Frischauf et al., J. Mol Biol 1 70: 827-842 (1 983); or Insertion 
vector, e.g., Huynh et al., In: Glover NM (ed) DNA Cloning: A practical Approach, VoM Oxford: IRL Press (1985); 

(f) T-DNA gene fusion vectors: Walden et al., Mol Cell Biol 1: 175-194 (1990); and 

(g) Plasmid vectors: Sambrook et al., infra. 

45 

[2288] Typically, a vector will comprise the exogenous gene, which in its turn comprises an SDF of the present 
invention to be introduced into the genome of a host cell, and which gene may be an antisense construct, a ribozyme 
construct chimeraplast, or a coding sequence with any desired transcriptional and/or translational regulatory sequenc- 
es, such as promoters, UTRs, and 3' end termination sequences. Vectors of the invention can also include origins of 

so replication, scaffold attachment regions (SARs), markers, homologous sequences, introns, etc. 

[2289] A DNA sequence coding tor the desired polypeptide, for example a cDNA sequence encoding a full length 
protein, will preferably be combined with transcriptional and translational initiation regulatory sequences which will 
direct the transcription of the sequence from the gene in the intended tissues of the transformed plant. 
[2290] For example, lor over-expression, a plant promoter fragment may be employed that will direct transcription 

ss of the gene in all tissues of a regenerated plant. Alternatively, the plant promoter may direct transcription of an SDF of 
the invention in a specific tissue (tissuespecific promoters) or may be otherwise under more precise environmental 
control (inducible promoters). 

[2291] If proper polypeptide productionis desired, a polyadenylation region at the 3' -end of the coding region is typ- 
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ically included. The polyadenylation region can be derived from the natural gene, from a variety of other plant genes, 
or from T-DNA. 

[2292] The vector comprising the sequences from genes or SDF or the invention may comprise a marker gene that 
confers a selectable phenotype on plant cells. The vector can include promoter and coding sequence, for instance. 
s For example, the marker may encode biocide resistance, particularly antibiotic resistance, such as resistance to kan- • 
amycin, G41 8, bleomycin, hygromycin, or herbicide resistance, such as resistance to chlorosulf uron or phosphinotricin. - 

IV.A. Coding Sequences 

io [2293] Generally, the sequence in the transformation vector and to be introduced into the genomo of the host cell 
does not need to be absolutely identical to an SDF of the present invention. Also, it is not necessary for it to be full 
length, relative to either the primary transcription product or fully processed mRNA. Furthermore, the introduced se- 
quence need not have the same intron or exon pattern as a native gene. Also, heterologous non-coding segments can 
be incorporated into the coding sequence without changing the desired amino acid sequence of the polypeptide to be 

is produced. 

IV.B. Promoters 

[2294] As explained above, introducing an exogenous SDF from the same species or an orthologous SDF from 
20 another species can modulate the expression of a native gene corresponding to that SDF of interest. Such an SDF 
construct can be under the control of either a constitutive promoter or a highly regulated inducible promoter (e.g., a 
copper inducible promoter). The promoter of interest can initially be either endogenous or heterologous to the species 
in question. When re-introduced into the genome of said species, such promoter becomes exogenous to said species. 
Over-expression of an SDF transgene can lead to co-suppression of the homologous endogeneous sequence thereby 
25 creating some alterations in the phenotypes of the transformed species as demonstrated by similar analysis of the 
chalcone synthase gene (Napoli et al.. Plant Cell 2.219 (1990) and van der Krol et al., Plant Cell 2:291 (1990)). If an 
SDF is found to encode a protein with desirable characteristics, its over-production can be controlled so that its accu- 
mulation can be manipulated in an organ- or tissue-specific manner utilizing a promoter having such specificity. 
[2295] Likewise, if the promoter of an SDF (or an SDF that includes a promoter) is found to be tissue-specific or 
30 developmental^ regulated, such a promoter can be utilized to drive or facilitate the transcription of a specific gene of 
interest {e.g., seed storage protein or root-specific protein). Thus, the level of accumulation of a particular protein can 
be manipulated or its spatial localization in an organ- or tissue- specific manner can be altered. 

IV. C Signal Peptides 

35 

[2296] SDFs of the present invention containing signal peptides are indicated in the REF and SEQ TABLES. In some 
cases it may be desirable for the protein encoded by an introduced exogenous or orthologous SDF to be targeled (1 ) 
to a particular organoid IntracollulMr compartment, (2) to Internet with t\ pinr11culi.tr moloculo ouch mm tt rnornlxiiMO mol- 
ecule or (3) for secretion outside of the cell harboring the introduced SDF. This will be accomplished using a signal 
40 peptide. 

[2297] Signal peptides direct protein targeting, are involved in ligand-receptor interactions and act in cell to cell 
communication. Many proteins, especially soluble proteins, contain a signal peptide that targets the protein to one of 
several different intracellular compartments. In plants, these compartments include, but are not limited to, tho endo- 
plasmic reticulum (ER), mitochondria, ptastids (such as chloroplasls), tho vacuole, tho Golgl apparatus, protoln storngo 

45 vessicles (PSV) and, in general, membranes. Some signal peptide sequences are conserved, such as the Asn-Pro- 
lle-Arg amino acid motif found in the N-terminal propeptide signal that targets proteins to the vacuole (Marty (1999) 
The Plant Celh 1 : 587-599). Other signal peptides do not have a consensus sequence perse, but are largely composed 
of hydrophobic amino acids, such as those signal peptides targeting proteins to the ER (Vitale and Denecke (1999) 
Tho Plant Coin 1 : 61 5-628). Still othore do not nppoar to contnln ollhor h conconGim ooquonco or nn IdontiliocJ common 

50 secondary sequence, for instance the chloroplast stromal targeting signal peptides (Koegstra and Cllne (1999) The 
Plant Ce//11: 557-570). Furthermore, some targeting peptides are bipartite, directing proteins first to an organelle and 
then to a membrane within the organelle (e.g. within the thylakoid lumen of the chloroplast; see Keegstra and Clino 
(1999) The Plant Ce// 11: 557-570). In addition to the diversity in sequence and secondary structure, placement of the 
signal peptide is also varied. Proteins destined for the vacuole, for example, have targeting signal peptides found at 

55 the N-t rminus, at the C-terminus and at a surface location in mature, folded proteins. Signal peptides also servo as 
ligands for some receptors. 1 
[2298] These characteristics of signal proteins can be used to more tightly control the phenotyptc expression of 
introduced SDFs. In particular, associating the appropriate signal sequence with a specific SDF can allow sequestering 
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of the protein in specific organelles (plastids, as an example), secretion outside of the cell, targeting interact™ with 
particular receptors, otc. Honce, the inclusion of signal proteins in constructs involving the SDFs pf the invention in- 
creases the range of manipulation of SDF phenotypic expression. The nucleotide sequence of the signal peptide can 
l,„ inobilnd Horn chn.i.cloriyod f.onoo ucing common moloculnr biological techniques or can bo synthesized in vitro 
[2299] In addition, tho native signal peptide soquoncos. both amino acid and nucleot.de, described in the REF and 
SEQ tables can be used to modulate polypeptide transport. Further variants of the native signal peptides described in 
the REF and SEQ tables are contemplated. Insertions, deletions, or substitutions can be made. Such variants will 
retain al least one ol the functions of the native signal peptide as well as exhibiting some degree of sequence identity 

to the native sequence. . . 

|2300j Also, fragments ol tho signal peptides ol tho invention aro usetul and can be fused with other smal peptides 

ol interest to modulate transport ol a polypeptide. 
V. Trnneformnllon Tochnlquos 

[2301] A wide range of techniques for inserting exogenous polynucleotides are known for a number of host cells, 
including, without limitation, bacterial, yeast, mammalian, insoct and plant colls. 

123021 Techniques for transforming a wide variety of higher plant species aro well known and descr.bed in the tech- 
nical and scientific literature. See, e.g. Weising et al.. Ann. Rev, Genet. 22:421 (1988); and Christou, Euphytica, v. 85, 
n 1 -3* 1 3-27 (1995) 

123031 DNA constructs of the invention may be introduced into the genome of the desired plant host by a variety of 
conventional techniques. For example, the DNA construct may be introduced directly into the genomic DNA of the 
planl coll uoing techniques such as oloctroporalion and microinjection of plant cell protoplasts, or the DNA constructs 
can be introduced directly to plant tissue using ballistic methods, such as DNA particle bombardment: Alternatively, 
the DNA constructs may be combined with suitable TDNA flanking regions and introduced into a conventional Agro- 
bacterium tumefaciens host vector. The virulence functions of the Agrobacterium tumefaciens host will direct the in- 
sertion of the construct and adjacent marker into the plant cell DNA when the cell is infected by the bacteria (McCormac 
et al., Mol. Biotechnol. 8:199 (1997); Hamilton. Gene 200: 107 (1997)); Salomon et al. EMBO J. 3:141 (1984); Herrera- 

Hr.trollft ot al. EMBOJ. 2:987 (1983). n,.™,,,,. Th« 

[2304] Microinjection techniques are known in the art and well described in the scientific and patent literature. The 
introduction of DNA constructs using polyethylene glycol precipitation is described in Paszkowsk. et al J ***0 3= 
2717 (1984) Electroporation techniques are described in Fromm et al. Proc. Natl Acad. Sci. USA 82:5824 (I9b&). 
Ballistic transformation techniques are described in Klein et al. Nature 327 J™ 0^)- Agrobacterium ^elaciens 
mediated transformation techniques, including disarming and use of binary or cointegrate vectors are well described 
in the scientilic literature. Soe, lor example Hamilton. CM, Gene 200:107 (1997); Muller et a I Mol Gen. Genet .227, 
171 (1987) Komari et al. Plant J. 10:165 (1996); Venkateswarlu et al. Biotechnology 9:1103 (1991) and Gleave. AP.. 
Plant Mol. Biol. 20: 1 203 (1 992); Graves and Goldman, Plant Mol. Biol. 7:34 (1 986) and Gould et al. , Plant Physiology 

So5? transformed plant cells which are derived by any of the above transformation techniques can be cultured to 
regenerate a whole plant that possesses the transformed genotype and thus the desired phenotype such as seedless- 
ness Such regeneration techniques rely on manipulation of certain phytohormones in a tissue culture growth med.um, 
typically relying on a biocide and/or herbicide marker which has been introduced together with the desired nucleotide 
sequences Plant regeneration from cultured protoplasts is described in Evans et al.. Protoplasts Isolation and Culture 
in Handbook of Plant Cell Culture." pp. 124176. MacMillan Publishing Company. New York, 1983; and B.nd.ng. Re- 
generation ot Plants, Plant Protoplasts, pp. 2173. CRC Press, Boca Raton, 1988. Regene ration ca n also be obtained 
from plant callus, oxplants, organs, or parts thereof. Such regeneration techniques are descr.bed generally in Klee et 
al Ann Rev. ol Plant Phys 38:467 (1987). Regeneration ot monocots (rice) is described by Hosoyama et al. (Biosci. 
Biotechnol. Biochem. 58: 1 500 (1994)) and by Ghosh et al. (J. Biotechnol. 32:1 (1994)). The nucleic acids of the mven- 
tion can be used to confer desired traits on essentially any plant. 

[23061 Thus the invention has use over a broad range o1 plants, including species from the genera Anacardium, 
Arachis Asparagus, Atropa. Avena, Brassica, Citrus, Citrullus. Capsicum, Carthamus, Cocos, Coffea, Cucum,s Cu- 
curbita Daucus. Elaeis. Fragaria, Glycine, Gossypium, Helianthus, Heterocallis, Hordeum, Hyoscyamus. Lactuca. 
Linum Lolium Lupinus, Lycopersicon, Malus, Manihot, Majorana, Medicago, Nicotiana, Olea. Oryza, Pameum, Pan- 
hesetum, Persea, Phaseolus, Pistachia, Pisum, Pyrus, Prunus, Raphanus, Ricinus. Secale, Senecio, Smapis, Sola- 
num Sorghum, Theobromus, Trigonella, Triticum, Vicia, Vitis, Vigna, and, Zea. 

[2307] One of skill will recognize that after the expression cassette is stably incorporated in transgenic plants and 
confirmed to be operable, it can be introduced into other plants by sexual crossing. Any of a number of standard 
breeding techniques can be used, depending upon the species to be crossed 

[2308] The particular sequences of SDFs identified are provided in the attached REF AND SEQ TABLES 1 AND 2. 
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One of ordinary skill in the art, having this data, can obtain cloned DNA fragments, synthetic DNA fragments or polypep- 
tides constituting desired sequences by recombinant methodology known in the art or described herein. 

EXAMPLES 

5 

[2309] The invention is illustrated by way of the following examples. The invention is not limited by these examples 
as the scope of the invention is defined solely by the claims following. 

EXAMPLE 1: cDNA PREPARATION 

10 

[231 0] A number of the nucleotide sequences disclosed in REF AND SEQ TABLES 1 AND 2 herein as representative 
of the SDFs of the invention can be obtained by sequencing genomic DNA (gDNA) and/or cDNA from com plants 
grown from HYBRID SEED ti 35A19, purchased from Pioneer Hi-Bred International, Inc., Supply Management, P.O. 
Box 256, Johnston, Iowa 501 31-0256. 

is [2311] A number of the nucleotide sequences disclosed in REF AND SEQ TABLES 1 AND 2 herein as representative 
of the SDFs of the invention can also be obtained by sequencing genomic DNA from Arabidopsis thaliana, Wassilews- 
kija ecotype or by sequencing cDNA obtained from mRNA from such plants as described below. This is a true breeding 
strain. Seeds of the plant are available from the Arabidopsis Biological Resource Center at the Ohio State University, 
under the accession number CS2360. Seeds of this plant were deposited under the terms and conditions of the Bu- 

20 dapest Treaty at the American Type Culture Collection, Manassas, VA on August 31, 1999, and were assigned ATCC 
No. PTA-595. 

[2312] Other methods for cloning full-length cDNA are described, for example, by Seki et al. , Plant Journal 1 5:707-720 
(1998) High-efficiency cloning of Arabidopsis full-length cDNA by biotinylated Cap trapper"; Maruyama et al., Gene 
138 :171 (1994) Oligo-capping a simple method to replace the cap structure of eukaryotic mRNAs with oligoribonucle- 

25 otides"; and WO 96/34981 . 

[231 3] Tissues were, or each organ was. individually pulverized and frozen in liquid nitrogen. Next, the samples were 
homogenized in the presence of detergents and then centrifuged. The debris and nuclei were removed from the sample 
and more detergents were added to the sample. The sample was centrifuged and the debris was removed. Then the 
sample was applied to a 2M sucrose cushion to isolate polysomes. The RNA was isolated by treatment with detergents 

30 and proteinase K followed by ethanol precipitation and centrifugation. The polysomal RNA from the different tissues 
was pooled according to the following mass ratios: 15/15/1 for male inflorescences, female inflorescences and root, 
respectively. The pooled material was then used for cDNA synthesis by the methods described below. 
[2314] Starting material for cDNA synthesis for the exemplary corn cDNA clones with soquoncos prosontod In REF 
AND SEQ TABLES 1 AND 2 was poly(A)-containing polysomal mRNAs from inflorescences and root tissues of corn 

35 plants grown from HYBRID SEED # 35A1 9. Male inflorescences and female (pre-and post-f ertilization) inflorescences 
were isolated at various stages of development. Selection for poly(A) containing polysomal RNA was done using oligo 
d(T) cellulose columns, as described by Cox and Goldberg, Plant Molecular Biology: A Practical Approach", pp. 1-35, 
Shaw ed. t c. 1988 by IRL, Oxford. The quality and the integrity of the polyA+ RNAs wore evaluated. 
[2315] Starting material for cDNA synthesis for the exemplary Arabidopsis cDNA clones with sequences presented 

40 in REF AND SEQ TABLES 1 AND 2 was polysomal RNA isolated from the top-most inflorescence tissues of Arabidopsis 
thai/ana Wassilewskija (Ws.) and from roots of Arabidopsis thaliana Landsberg erecta (L. er.), also obtained from the 
Arabidopsis Biological Resource Center. Nine parts inflorescence to every part root was used, as measured by wet 
mass. Tissue was pulverized and exposed to liquid nitrogen. Next, the sample was homogenized in the presence of 
detergents and then centrifuged. The debris and nuclei were removed from the sample and more dotorgonte were 

45 added to the sample. The sample was centrifuged and the debris was removed and the sample was applied to a 2M 
sucrose cushion to isolate polysomal RNA. Cox et al., Plant Molecular Biology: A Practical Approach", pp. 1-35, Shaw 
ed. ( c. 1988 by IRL t Oxford. The polysomal RNA was used for cDNA synthesis by the methods described below. 
Polysomal mRNA was then isolated as described above for corn cDNA. The quality of the RNA was assessed elec- 
trophoretically. 

so [2316] Following preparation of the mRNAs from various tissues as described above, selection of mRNA with intact 
5' ends and specific attachment of an oligonucleotide tag to the 5' end of such mRNA was performed using either a 
chemical or enzymatic approach. Both techniques take advantage of the presence of the cap" structure, which char- 
acterizes the 5* end of most intact mRNAs and which comprises a guanosine generally methylated once, at the 7 
position. 

55 [2317] The chemical modification approach involves the optional elimination of the 2', 3'-cis diol of the 3' terminal 
ribose, the oxidation of the 2\ 3'-cis dlol of the ribose linkod to the cap of the 5' onds ol tho mRNAs Into a dlaldohydo, 
and the coupling of the such obtained dialdehyde to a derivatized oligonucleotide tag. Further detail regarding the 
chemical approaches for obtaining mRNAs having intact 5' ends are disclosed in International Application No. 
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W096/34981 published November 7,1996. 

[2318] The enzymatic approach lor ligating the oligonucleotide tag tothe intact 5' endsol mRNAs involves the removal 
of the phosphate groups present on the 5' ends ot uncapped incomplete mRNAs, the subsequent decapp.ngof mRNAs 
lw ivlnn inlncl 5' onds rind tho ligation ot Iho phosphfllo prosont at tho 5' ond ot tho docapped mRNA to an oligonucleotide 
s tag Furlhor detail regarding Iho enzymatic approaches lor obtaining mRNAs having intact 5' ends are disclosed in 
Dumas Milne Edwards J.B. (Doctoral Thesis of Paris VI University. Le clonage des ADNc complets: difficultes et per- 
spectives nouvelles. Apports pour I'etude de la regulation de I'expression de la tryptophane hydroxylase de rat. 20 
Dec 1993), EPO 625572 and Kato et al.. Gene 150:243-250 (1994). . 

12319] In both the chemical and the enzymatic approach, the oligonucleotide tag has a restriction enzyme site (e.g. 
io („, EcoRI oilo) Ihoioln lo U.cilili.to Inlor cloning piocoduroe. Following attachment ot the oligonucleotide tag to the 
mRNA. the integrity ot the mRNA is examined by pertorming a Northern blot using a probe complementary to the 
oligonucleotide tag. 

[2320] For the mRNAs joined to oligonucleotide tags using either the chemical or the enzymatic method, first strand 
cDNA synthesis is performed using an oligo-dT primer with reverse transcriptase. This oligo-dT primer can contain an 
is internal tag ol at least 4 nucleotides, which can be different from one mRNA preparation to another. Methylated dCTP 
is usod lor cDNA first strand synthesis to protect tho internal EcoRI sites from digestion during subsequent steps, The 
first strand cDNA is precipitated using isopropanol after removal of RNA by alkaline hydrolysis to eliminate residual 

primers. . 
[2321] Socond strand cDNA synthesis is conducted using a DNA polymerase, such as Klenow fragment and a primer 
20 corresponding to the 5' end of the ligated oligonucleotide. The primer is typically 20-25 bases in length. Methylated 
dCTP is used for second strand synthesis in order to protect internal EcoRI sites in the cDNA from digestion during 
tho cloning procoss. 

[2322] Following second strand synthesis, the full-length cDNAs are cloned into a phagemid vector, such as pBlue- 
Script™ (Stratagono). Tho ends of the full-length cDNAs are blunted with T4 DNA polymerase (Biolabs) and the cDNA 
so io cJIgoGlod with EcoRI. Since mothylalod dCTP ie usod during cDNA synlhosls, tho EcoRI site prosont in the lag is iho 
only hemi-methylated site; hence the only site susceptible to EcoRI digestion. In some instances, to facilitate subclon- 
ing an Hind III adapter is added to the 3" end of full-length cDNAs. 

[2323] The full-length cDNAs are then size fractionated using either exclusion chromatography (AcA. Biosepra) or 
electrophoretic separation which yields 3 to 6 different fractions. The lull-length cDNAs are then directionally cloned 

30 either into pBlueScript™ using either the EcoRI and Smal restriction sites or, when the Hind III adapter is present in 
the full-length cDNAs, the EcoRI and Hind III restriction sites. The ligation mixture is transformed, preferably by elec- 
troporation, into bacteria, which are then propagated under appropriate antibiotic selection. 
[2324] Clones containing the oligonucleotide tag attached to full-length cDNAs are selected as follows. 
[2325] The plasmid cDNA libraries made as described above are purified (e.g. by a column available from Qiagen). 

35 A positive selection of the tagged clones is performed as follows. Briefly, in this selection procedure, the plasmid DNA 
is converted to single stranded DNA using phage Fl gene II endonuclease in combination with an exonuclease (Chang 
et al Gene 127 95 (1993)) such as exonuclease III or T7 gene 6 exonuclease. The resulting single stranded DNA is 
then puritied~uling paramagnetic beads as described by Fry et al.. Biotechniques J3: 124 (1992). Here the single 
cimndod DNA is hybridizod with a biotinylatod oligonucleotide having a sequence corresponding to the 3" end ol the 

io oligonucleotide tag. Preterably. the primer has a length ol 20-25 bases. Clones Including a sequence complementary 
to the biotinylated oligonucleotide are selected by incubation with streptavidin coated magnetic beads followed by 
magnetic capture. Alter capture of the positive clones, the plasmid DNA is released from the magnetic beads and 
converted into double stranded DNA using a DNA polymerase such as ThermoSequenase™ (obta.ned Irom Amersham 
Pharmacia Biotech). Alternatively, protocols such as the Gene Trapper™ kit (Gibco BRL) can be used. The double 

is stranded DNA is thon transformed, preferably by oloclroporalion, info bacteria. The percentage of positive clones 
having the 5' tag oligonucleotide is typically estimated to be between 90 and 98% Irom dot blot analysis. 
[2326] Following transformation, the libraries are ordered in microliter plates and sequenced. The Arabidopsis library 
was deposited at the American Type Culture Collection on January 7. 2000 as E-coli I iba 0 1 0600" under the accession 
number PTA-1161. 
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EXAMPLE 2: SOUTHERN HYBRIDIZATIONS 


[2327] The SDFs ol the invention can be used in Southern hybridizations as described above. The following describes 
extraction ot DNA Irom nuclei ot plant cells, digestion ol the nuclear DNA and separation by length, transter ol the 
55 separated fragments to membranes, preparation of probes for hybridization, hybridization and detection of the hybrid- 

ized probe. 

[2328] The procedures described herein can be used to isolate related polynucleotides or tor diagnostic purposes. 
Moderate stringency hybridization conditions, as defined above, are described in the pr sent example. These condi- 
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tions result in detection of hybridization between sequences having at least 70% sequence identity. As described above, 
the hybridization and wash conditions can be changed to reflect the desired percentage of sequence identity between 
probe and target sequences that can be detected. 

[2329] In the following procedure, a probe for hybridization is produced from two PCR reactions using two. primers 
from genomic sequence of Arabidops/s thaliana. As described above, the particular template for generating the probe 
can be any desired template. 

[2330] The first PCR product is assessed to validate the size of the primer to assure it is of the expected size. Then 
the product of the first PCR is used as a template, with the same pair of primers used in the first PCR, in a second 
PCR that produces a labeled product used as the probe. 

[2331] Fragments detected by hybridization, or other bands of interest, can be isolated Irom gels usod to separate 
genomic DNA fragments by known methods for further purification and/or characterization. 

Buffers for nuclear DNA extraction 

[2332] 


1. 10X HB 



1000 ml 


40 mM spermidine 

10.2 g 

Spermine (Sigma S-2876) and spermidine (Sigma S-2501) 

10 mM spermine 

35 g 

Stabilize chromatin and the nuclear membrane 

0.1 M EDTA (disodium) 

37.2 g 

EDTA inhibits nuclease 

0.1 MTris 

12.1 g 

Buffer 

0.8 M KCI 

59.6 g 

Adjusts ionic strength for stability of nuclei 

Adjust pH to 9.5 with 10 N NaOH. It appears that there is a nuclease present in leaves. Use ol pH 9.5 appears 
to inactivate this nuclease. 


2. 2 M sucrose (684 g per 1000 ml) 

Heat about half the final volume of water to about 50°C. Add the sucrose slowly then bring the mixture to close 
to final volume; stir constantly until it has dissolved. Bring the solution to volume. 


3. Sarkosyl solution (lyses nuclear membranes) 



1000 ml 

N-lauroyl sarcosine (Sarkosyl) 
0.1 M Tris 

0.04 M EDTA (Disodium) 

20.0 g 

12.1 g 
14.9 g 

Adjust the pH to 9.5 after all the components are dissolved and bring up to the proper volume. 


4. 20% Triton X-100 

80 ml Triton X-100 

320 ml 1xHB (w/o p-ME and PMSF) 

Prepare in advance; Triton takes some time to dissolve 

A. Procedure 

[2333] 

1. Prepare 1X H" buffer (keep ice-cold during use) 



1000 ml 

10XHB 

100 ml 
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(continued) 



l kjw mi 

2 M aucroso 
Water 

250 ml a non-ionic osmoticum 
634 mi 

Added Juot boforo uoo: 


100 mM PMSF* 
p-mercaptoethanol 

10 ml a protease inhibitor; prptects nuclear membrane proteins 
1 ml inactivates nuclease by reducing disulfide bonds 


M00 mM PMSF 

(phenyl rnelhyl sulfonyl fluoride, Sigma P-7626) 
(add 0.0875 g to 5 ml 100% ethanol) 

2 Homogenize the tissue in a blender (use 300-400 ml of 1xHB per blender). Be sure that you use 5-10 ml of HB 
buffer per gram of tissue. Blenders generale heat so be sure to keep the homogenate cold. It is necessary to put 
iho blondoro In ico periodically. 

3. Add Ihe 20% Triton X-100 (25 ml per liter of homogenate) and gently stir on ice tor 20 min. Thi6.lys.es plastic!, 
but not nuclear, membranes, 

4 Filior iho ihrnuo suspension through several nylon filters into an ice-cold beaker. The first filtration is through a 
250-mlc.on mombinno; iho eocond la through un 05-micron mombrano; iho third is through a 50-micron membrane, 
and the fourth is through a 20-micron membrane. Use a large funnel to hold the fillers. Filtralion can be sped up 
by gently squeezing the liquid through the filters. 

5. Centrifuge the filtrate at 1 200 x g tor 20 min. at 4°C to pellet the nuclei. 

6 Discard iho dark groen supernatant. The pellet will have several layers to it. One is starch; it is white and gritty. 
The nuclei are gray and soft. In the early steps, there may be a dark green and somewhat viscous layer of chlo- 
roplasts. 

Wash the pellets in about 25 ml cold H buffer (with Triton X-100) and resuspend by swirling gently and pipetting. 
Afler tho pellets aro ro6uspondod. 

Pellet the nuclei again at 1200 - 1300 x g. Discard the supernatant. 

Repeat the wash 3-4 times until the supernatant has changed from a dark green to a pale green. This usually 
happens after 3 or 4 resuspensions. At this point, the pellet is typically grayish white and very slippery The 
Triton X-100 in these repeated steps helps to destroy the chloroplasts and mitochondria that contaminate the 
prep. Resuspend the nuclei for a final time in a lotal of 1 5 ml of H bufler and transfer the suspension to a sterile 
125 ml Erlenmeyer flask. 

7. Add 15 ml. dropwise, cold 2% Sarkosyl. 0.1 M Tris. 0.04 M EDTA solution ( P H 9.5) while swirling gently. This 
lyses the nuclei. The solution will become very viscous. 

8. Add 30 grams of CsCI and gently swirl at room temperature until the CsCI is in solution. The mixture will be gray, 
white and viscous. 

9. Centrifuge the solution at 1 1 .400 x g at A'C for at least 30 min. The longer this spin is. the firmer the protein pellicle. 

10 The result is typically a clear green supernatant over a white pellet, and (perhaps) under a protein pellicle. 
Carefully remove the solution under the protein pellicle and above the pellet. Determine the density of the sol u ion 
by weighing 1 ml of solution and add CsCI if necessary to bring to 1 .57 g/ml. The solution contains dissolved solids 
(sucrose etc) and the refractive index alone will not be an accurate guide to CsCI concentration. 


11 . Add 20 til of 1 0 mg/ml EtBr per ml o1 solution. 


333 


1033405A2 I > 


EP 1 033 405 A2 


12. Centrifuge at 184,000 x g for 16 to 20 hours in a fixed-angle rotor. 

13. Remove the dark red supernatant that is at the top of the tube with a plastic transfer pipette and discard. 
Carefully remove the DNA band with another transfer pipette. The DNA band is usually visible in room light; oth- 
erwise, use a long wave UV light to locate the band. 

14. Extract the ethidium bromide with isopropanol saturated with water and salt. Once the solution is clear, extract 
at least two more times to ensure that all of the EtBr is gone. Be very gentle, as it is very easy to shear the DNA 
at this step. This extraction may take a while because the DNA solution tends to be very viscous. If the solution is 
too viscous, dilute it with TE. 

15. Dialyze the DNA for at least two days against several changes (at least three times) of TE (10 mM Tris, 1mM 
EDTA, pH 8) to remove the cesium chloride. 

16. Remove the dialyzed DNA from the tubing. If the dialyzed DNA solution contains a lot of debris, centrifuge the 
DNA solution at least at 2500 x g for 10 min. and carefully transfer the clear supernatant to a new lube. Read the 
A260 concentration of the DNA. 

17. Assess the quality of the DNA by agarose gel electrophoresis (1% agarose gel) of the DNA. Load 50 ng and 
100 ng (based on the OD reading) and compare it with known and good quality DNA. Undigested lambda DNA 
and a lambda-Hindlll-digested DNA are good molecular weight makers. 

Protocol for Digestion of Genomic DNA 

Protocol: 
[2334] 

1 . The relative amounts of DNA for different crop plants that provide approximately a balanced number of genome 
equivalent is given in Table 3. Note that due to the size of the wheat genome, wheat DNA will be underrepresented. 
Lambda DNA provides a useful control for complete digestion. 

2. Precipitate- tho DNA by adding 3 volumos of 100% othwnol. Incubrilo nt -20°C for at lonot two houro. YomoI DNA 
can be purchased and made up at the necessary concentration, therefore no precipitation is necessary for yeast 
DNA. 

3. Centrifuge the solution at 1 1 ,400 x g for 20 min. Decant the ethanol carefully (be careful not to disturb the pellet). 
Be sure that the residual othanol is complotoly romovod oithor by vacuum dooiccHtton or by cnrofully wipinn Iho 
sides of the tubes with a clean tissue. 

4. Resuspend the pellet in an appropriate volume of water. Be sure the pellet is fully resuspended before proceeding 
to the next step. This may take about 30 min. 

5. Add the appropriate volume of 10X reaction buffer provided by the manufacturer of the restriction enzyme to 
the resuspended DNA followed by the appropriate volume of enzymes. Be sure to mix it properly by slowly swirling 
the tubes. 

6. Set-up the lambda digestion-control for each DNA that you are digesting. 

7. Incubate both the experimental and lambda digests overnight at 37°C. Spin down condensation in a microfuge 
before proceeding. 

8. After digestion, add 2 uj of loading dye (typically 0.25% bromophonol blue, 0.25% xylono cyanol in 15% Flcoll 
or 30% glycerol) to the lambda -control digests and load in 1% TPE-agarose gel (TPE is 90 mM Tris-phosphate, 2 
mM EDTA, pH 8). If the lambda DNA in the lambda control digests are completely digested, proceed with the 
precipitation of the gonomic DNA in the digoots. I 

9. Preclpitalo the digested DNA by adding 3 volumes of 1 00% elhanol arid incubating In -20 9 C lor al loasl 2 hours 
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(preferably overnight). 

EXCEPTION: Arabidopsis and yeast DNA are digested in an appropriate volume; they don t have to be precipitated. 

1 0 Rocucpond iho DNA in nn appropriate volume of TE (o.g. , 22 ul x 50 blots = IIOOpI) and an appropriate volume 
ol 10X loading dyo (e.g., 2.4 »l x 50 blots = 120 ul), Bo caiolul in pipotling tho loading dy - it is viscous. Be sure 
you are pipetting the correct volume. 


Table 3 


Some auido points in digesting genomic DNA. 


opecies 

fiPiiniYifl Sly© 

Size Relative to 
Arabidopsis 

Genome Equivalent to 
2 pg Arabidopsis DNA 

Amount of DNA per 
blot 

Arabidopsis 

120 Mb 

1X 

1X 

2H9 

Brassica 

1 t 100 Mb 

9.2X 

0.54X 

10 |ig 

Corn 

2,800 Mb 

23. 3X 

0.43X 

20 u.g 

Cotton 

2,300 Mb 

19.2X 

0.52X 

20 ug 

Oat 

11,300 Mb 

94X 

0.11X 

20 fig 

Rice 

400 Mb 

3.3X 

0.75X 

5 ug 

Soyboan 

1,100 Mb 

9.2X 

0.54X 

10 ug 

Sugarbeet 

758 Mb 

6.3X 

0.8X 

10 ng 

Sweelclover 

1,100 Mb 

9.2X 

0.54X 

10 pg 

Wheat 

16,000 Mb 

133X 

0.08X 

20 pg 

Yoasl 

15 Mb 

0.1 2X 

1X 

0.25 ug 


15 


20 


25 


35 


40 


45 


50 


30 Protocol for Southern Blot Analysis 

[2335] The digested DNA samples are electrophoresed in 1% agarose gels in Ix TPE buffer. Low voltage; overnight 
separations are preferred. The gels are stained with EtBr and photographed. 

1 . For blotting the gels, first incubate the gel in 0.25 N HCI (with gentle shaking) for about 1 5 min. 

2. Then briefly rinse with water. The DNA is denatured by 2 incubations. Incubate (with shaking) in 0.5 M NaOH 
in 1.5 M NaCI for 15 min. 

3. The gel is then briefly rinsed in water and neutralized by incubating twice (with shaking) in 1 .5 M Tris pH 7.5 in 
1.5 M NaCI for 15 min. 

4. A nylon membrane is prepared by soaking it in water for at least 5 min, then in 6X SSC for at least 15 min. 
before use. (20x SSC is 175.3 g NaCI, 88.2 g sodium citrate per liter, adjusted to pH 7.0.) 

5 The nylon membrane is placed on top of the gel and all bubbles in between are removed. The DNA is blotted 
from the gel to the membrane using an absorbent medium, such as paper toweling and 6x SCC buffer. After the 
transfer, the membrane may be lightly brushed with a gloved hand to remove any agarose sticking to the surface. 

6. The DNA is then fixed to the membrane by UV crosslinking and baking at 80°C. The membrane is stored at 4°C 
until use. 
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B. Protocol for PCR Amplification of Genomic Fragments in Arabidopsis 

Amplification procedures: 

[2336] 


1. Mix the following in a 0.20 ml PCR tube or 96-well PCR plate: 


Volume 

Stock. 

Final Amount or Cone. 

0.5 pi 

-10 ng/Ml genomic DNA 1 

5 ng 

2.5 uJ 

10X PCR buffer 

20 mMTris, 50 mM KCI 

0.75 Ml 

50 mM MgCI 2 

1.5 mM 

1 M l 

10 pmol/Ml Primer 1 (Forward) 

10 pmo! 

1 Ml 

10 pmol/Ml Primer 2 (Reverse) 

10 pmol 

0.5 Ml 

5 mM dNTPs 

0.1'mM 

0.1 Ml 

5 units/Ml Platinum Taq" (Life Technologies, Gaithersburg, MD) DNA 
Polymerase 

1 units 

(to 25 mO 

Water 



2. The template DNA is amplified using a Perkin Elmer 9700 PCR machine: 


1) 94°C for 10 min. followed by 

2) 

3) 

4) 

5 cycles: 

5 cycles: 

25 cycles: 

94 °C - 30 sec 
62 °C - 30 sec 
72 °C - 3 min 

94 °C - 30 sec 
58 °C - 30 sec 
72 °C - 3 min 

94 °C - 30 sec 
53 °C - 30 sec 
72 °C - 3 min 

5) 72°C for 7 min. Then the reactions are stopped by chilling to 4°C. 


[2337] The procedure can be adapted to a multi-well format if necessary. 

Quantification and Dilution of PCR Products: 

[2338] 

1. The product of the PCR is analyzed by electrophoresis in a 1% agarose gel. A linearized plasmid DNA can be 
used as a quantification standard (usually at 50, 100, 200, and 400 ng). These will be used as rolorencos to 
approximate the amount of PCR products. Hindlll-digested Lambda DNA is useful as a molecular weight marker. 
The gel can be run fairly quickly; e.g., at 100 volts. The standard gel is examined to determine that the size of the 
PCR products is consistent with the expected size and if there are significant extra bands or smeary products in 
the PCR reactions. 

2. Tho amounts of PCR products can bo ostimatod on tho btisis of tho plaemid ctandnrd. 

3. For the small number of reactions that produce extraneous bands, a small amount of DNA from bands with the 
correct size can be isolated by dipping a sterile 1 0-pl tip into the band while viewing though a U V TransiMuminator. 
The small amount of agarose gel (with the DNA fragment) is used in the labeling reaction. 

i 
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C. Protocol for PCR-DIG-Labeling of DNA 

Solutions! 
6 [2339] 

Reagents in PCR reactions (diluted PCR products, 1 0X PCR Buffer, 50 mM MgCI 2 , 5 U/pl Platinum Taq Polymerase, 
and the primers) 

to 10X dNTP i DIG-11-dUTP[1:6]: (2 mM dATP, 2 mM dCTP, 2 mM dGTP, 1.65mMdTTP, 0.35 mM DIG-11-dUTP) 

1 0X dNTP + DIG-1 1 ^dUTP [1 : 1 0]: (2 mM dATP, 2 mM dCTP, 2 mM dGTP, 1 .81 mM dTTP, 0. 1 9 mM DIG-1 1 -dUTP) 
1 0X dNTP + DIG-1 1 -dUTP [1 : 1 5]: (2 mM dATP, 2 mM dCTP, 2 mM dGTP, 1 .875 mM dTTP, 0.1 25 mM DIG- 11 -dUTP) 

75 

TE bullor (10 mM Tris, 1 mM EDTA, pH 8) 

Maleate butter: In 700 ml of deionized distilled water, dissolve 11 .61 g maleic acid and 8.77 g NaCI. Add NaOH to 
adjust the pH to 7.5. Bring the volume to 1 L Stir tor 15 min. and sterilize. 

20 

10% blocking solution: In 80 ml deionized distilled water, dissolve Ll6g maleic acid. Next, add NaOH to adjust 
iho pH to 7.5, Add 1 0 g ol the blocking roagont powder (Boohringor Mannheim, Indianapolis, IN, Cat. no. 10961 76). 
Heat to 60° C while stirring to dissolve the powder. Adjust the volume to 100 ml with water. Stir and sterilize. 

25 i% blocking solution: Dilute the 10% stock to 1% using the maleate buffer. 

Butter 3 (100 mM Tris, 100 mM NaCI, 50 mM MgCI 2 , pH9.5). Prepared from autoclaved solutions of 1M Tris pH 
9.5, 5 M NaCI, and 1 M MgCI 2 in autoclavod distilled wator. 

30 Procedure: 

[2340] 

1. PCR reactions are performed in 25 uJ volumes containing: 


PCR buffer 

1X 

MgCI 2 

1.5 mM 

10X dNTP + DIG-11 -dUTP 

1X (please see the note below) 

Platinum Taq™ Polymerase 

1 unit 

10 pg probe DNA 


10 pmol primer 1 



Note: 



Use tor: 

10X dNTP + DIG-11 -dUTP (1:5) 

< 1 kb 

10X dNTP + DIG-11 -dUTP (1:10) 
10X dNTP + DIG-11 -dUTP (1:15) 

1 kb to 1 .8 kb 
> 1.8 kb 


2. The PCR reaction uses the following amplification cycles: 

55 
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1) 94°C for 10 min. 

2) 

3) 

4) 

5 cycles: 

5 cycles: 

25 cycles: 

95°C - 30 sec 
61°C - 1 min 
73°C - 5 min 

95° C - 30 sec 
59°C - 1 min 
75°C - 5 min 

95°C - 30 sec 
51°C - 1 min 
73°C - 5 min 

5) 72*C for 8 min. The reactions are terminated by chilling to 4°C (hold). 


3. The products are analyzed by electrophoresis- in a 1% agarose gel, comparing to an aliquot' of the unlabelled 
probe starting material. 

4. The amount of DIG-labeled probe is determined as follows: 

Make serial dilutions of the diluted control DNA in dilution buffer (TE: 10 mM Tris and 1 mM EDTA, pH 8) as 
shown in the following table: 


DIG-labeled control DNA starting cone. 

Stepwise Dilution 

Final Cone. (Dilution Name) 

5 ng/uJ 

1 |il in 49 pi TE 

100 pg/Ml (A) 

100pg/ul (A) 

25 pi in 25 m' TE 

50 pg/Ml (B) 

50 pg/uJ (B) 

25 Ml in 25 Ml TE 

25 pg/Ml (C) 

25pg/uJ(C) 

20 Ml in 30 mI TE 

10 pg/Ml (D) 


a. Serial deletions of a DIG-labeled standard DNA ranging from 100 pg to 10 pg are spotted onto a positively 
charged nylon membrane, marking the membrane lightly with a pencil to identify each dilution. 

b. Serial dilutions (e.g., 1:50, 1:2500, 1:10,000) of the newly labeled DNA probe are spotted. 

c. The membrane is fixed by UV crosslinking. 

d. The membrane is wetted with a small amount of maleate buffer and then incubated in 1% blocking solution 
lor 15 min at room temp. 

e. The labeled DNA is then detected using alkaline phosphatase conjugated anti-DIG antibody (Boehringer 
Mannheim, Indianapolis, IN, cat. no. 1093274) and an NBT substrate according to the manufacture's instruc- 
tion. 

f. Spot intensities of the control and experimental dilutions are then compared to estimate the concentration 
of the PCR-DIG-labeled probe. 

D. Prehybrldlzatlon and Hybridization of Southern Blots 

Solutions: 
[2341] 


100% Formamide 

purchased from Gibco 

20X SSC 

(1X = 0.15 M NaCI, 0.015 M Na 3 citrate) 

per L: 

175gNaCI 


87.5 g Na 3 citrate-2H 2 0 


20% Sarkosyl (N-leiuroyl-Bflrcoslno) I 
20% SDS (sodium dodecyl sulphate) 
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10% Blocking Reagent: In 80 ml deionized distilled water, dissolve 1.16 g maleic acid. Next, add NaOH to adjust 
the pH to 7.5. Add 1 0 g of the blocking reagent powder. Heat to 60°C while stirring to dissolve the powder. Adjust 
the v6lume to 100 ml with water. Stir and sterilize. 


w 


15 


20 


25 


30 


35 


40 


4$ 


50 


55 


Prehybridization Mix: 


Final Concentration 

Components 

Volume (per 100 ml) 

Stock 

50% 

Formamide 

50 ml 

100% 

5X 

SSC 

25 ml 

20X 

0.1% 

Sarkosyl 

0.5 ml 

20% 

0.02% 

SDS 

0.1 ml 

20% 

2% 

Blocking Reagent 

20 ml 

10% 


Water 

4.4 ml 



General Procedures: 
[2342] 

1 Place tho blot in a hoal-ooalablo plastic bag and add an appropriate volume ot prehybridization solution (30 mi/ 
100cm2) at room temperature. Seal the bag with a heat sealer, avoiding bubbles as much as poss.ble. Lay down 
the bags in a large plastic tray (one tray can accommodate at least 4-5 bags). Ensure that the bags are lying la 
in the tray so that the prehybridization solution is evenly distributed throughout the bag. Incubate the blot for at 
least 2 hours with gentle agitation using a waver shaker. 

2. Denature DIG-labelod DNA probe by incubating lor 10 min. at 98° C using the PCR machine and immediately 
coolitto4°C. 

3. Add probe to prehybridization solution (25 ng/ml; 30 ml = 750 ng total probe) and mix well but avoid foaming. 
Bubbles may lead to background. 

4. Pour oil the prehybridization solution from the hybridization bags and add new prehybridization and probe so- 
lution mixture to the bags containing the membrane. 

5. Incubate with gentle agitation for at least 16 hours. 

6. Procood to medium stringency post-hybridization wash: 

Three times for 20 min. each with gentle agitation using 1 X SSC, 1 % SDS at 60°C. 

All wash solutions must be prewarmed to 60°C. Use about 100 ml of wash solution per membrane. 

To avoid background keep the membranes fully submerged to avoid drying in spots; agitate sufficiently to 
avoid having membranes stick to one another. 

7. After the wash, proceed to immunological detection and CSPD development. 
E. Procedure for Immunological Detection with CSPD 

Solutions: 
[2343] 

Buf|Gr ! . Maleic acid buffer (0.1 M maleic acid, 0.15 M NaCI; adjusted to pH 7.5 with NaoH) 
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Washing buffer: Maleic acid buffer with 0.3% (v/v) Tween 20. 

Blocking stock solution 10% blocking reagent in buffer 1 . Dissolve (10X concentration): blocking reagent pow- 
der (Boehringer Mannheim, Indianapolis, IN, cat. no. 1096176) by constantly stirring 

5 on a 65°C heating block or heat in a microwave, autoclave and store at 4°C. 

Buffer 2 

(1X blocking solution): Dilute the stock solution 1:10 in Buffer- 1. 

io Detection buffer: 0.1 M Tris, 0.1 M NaCI, pH 9.5 
Procedure: 


16 


20 


25 


30 


[2344] 


1. After the post-hybridization wash the blots are briefly rinsed {1-5 min.) in the maleate washing bufler with gentle 
shaking. 

2. Then the membranes are incubated for 30 min. in Buffer 2 with gentle shaking. 

3. Anti-DIG-AP conjugate (Boehringer Mannheim, Indianapolis, IN, cat. no. 1093274) at 75 mU/ml (1:10,000) in 
Buffer 2 is used lor detection. 75 ml of solution can be used for 3 blots. 

4. The membrane is incubated for 30 min. in the antibody solution with gentle shaking. 

5. The membrane are washed twice in washing buffer with gentle shaking. About 250 mts is used per wash for 3 
blots. 

6. The blots are equilibrated for 2-5 min in 60 ml detection buffer. 

7. Dilute CSPD (1:200) in detection buffer. (This can be prepared ahead of time and stored in the dark at 4°C). 
The following steps must be done individually. Bags (one lor detection and one for exposure) are generally cut 
and ready before doing the following steps. 


35 8. The blot is carefully removed from the detection buffer and excess liquid removed without drying the membrane. 

The blot is immediately placed in a bag and 1 .5 ml of CSPD solution is added. The CSPD solution can be spread 
over the membrane. Bubbles present at the edge and on the surface of the blot are typically removed by gentle 
rubbing. The membrane is incubated for 5 min. in CSPD solution. 

*o g. Excess liquid is removed and the membrane is blotted briefly (DNA side up) on Whatman 3MM paper. Do not 

let the membrane dry completely. 

10. Seal the damp membrane in a hybridization bag and incubate for 10 min at 37° C to enhance the luminescent 
reaction. 

45 

11. Expose for 2 hours at room temperature to X-ray film. Multiple exposures can betaken. Luminescence continues 
for at least 24 hours and signal intensity increases during the first hours. 

Example 3: Transformation of Carrot Cells 

so 

[2345] Transformation of plant cells can be accomplished by a number of methods, as described above. Similarly, 
a number of plant genera can be regenerated from tissue culture following transformation. Transformation and regen- 
eration of carrot cells as described herein is illustrative. 

[2346] Single cell suspension cultures of carrot (Daucus carota) cells are established from hypocotyls of cultivar 
55 Early Nantes in B 5 growth medium (O.L. Gamborg et al., Plant Physiol 45:372 (1970)) plus 2,4-D and 15 mM CaCI 2 
(B s -44 medium) by mothodo known In Iho nrt, Tha nuopormlon milturn& me mibaullurnd by ntltjlnu 10 »nl al iho mm= 
pension culture to 40 ml of B5-44 medium in 250 ml flasks every 7 days and are maintained in a shaker at 150 rpm at 
27 °C in the dark. 
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[2347] The suspension culture cells are transformed with exogenous DNA as described by 2. Chen et al. Plant Mot. 
Bio. '36: 163 (1998). Briefly, 4-days post-subculture cells are incubated with cell wall digestion solution containing 0.4 
M sorbitol, 2% driselase, 5mM MES (2-[N-Morpholino] ethanesulfonic acid) pH 5:0 for 5 hours. The digested cells are 
pollntod fjonlly n\ GO xg for 5 min. rind wnchod twico in W5 solution containing 1 54 mM NaCI, 5 mM KCI, 1 25 mM CaClg 
£ and 5mM glucose, pH 6.0. The protoplasts are suspended in MC solution containing 5 mM MES ( 20 mM CaClg, 0.5 
Mmannitol, pH 5.7 and the protoplast density is adjusted to about 4 x 10 6 protoplasts per ml. 

[2348] 15-60 ug of plasmid DNA is mixed with 0.9 ml of protoplasts. The resulting suspension is mixed with 40% 
polyethylene glycol (MW 8000, PEG 8000), by gentle inversion a few times at room temperature for 5 to 25 min. 
Protoplast culture rhodium known in the art is added into the PEG-DNA-protoplast mixture. Protoplasts are incubated 
in . in it wt milturo modlurn lor 'M hour to b dnyn unci coll nxlmclo cnn bo wood lor nosay ol transient oxprossion of the 
introduced gone. Alternatively, translorrnod colls can bo used to produco transgenic callus, which in turn can bo used 
to produce transgenic plants, by methods known in the art. See, for example. Nomura and Komamine, Pit Phys. 79: 
988-991 (1 985), Identification and Isolation of Single Cells that Produce Somatic Embryos in Carrot Suspension Cul- 
tures. 

is [2349] Tho invention boing thus described, it will be apparent to one of ordinary skill in the art that various modifica- 
tions ot tho irintorlnlb nnd molhodo lor prnclictng tho invonlion can bo m/ido. Such modifications aro to bo considored 
within the scope of the invention as defined by the following claims. 

[2350] Each of the references from the patent and periodical literature cited herein is hereby expressly incorporated 
in its entirety by such citation. 


20 

Clnlme 

1 . An isolated nucleic acid molecule comprising a nucleic acid having a nucleotide sequence which encodes an amino 
25 acid sequence exhibiting at least 40% sequence identity to an amino acid sequence encoded by 

(a) a nucleotide sequence described in REF and/or SEO Table 1 or 2 or a fragment thereof; or 

(b) a complornont of a nucleotide soquonco shown in REF and/or SEQ Table 1 or 2 or a fragment thereof. 

30 2. An isolated nucleic acid molecule comprising a nucleic acid having a nucleotide sequence which exhibits at least 
65% sequence identity to 

(a) a nucleotide sequence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof; or 

(b) a complement ot a nucleotide sequence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof. 

35 

3. An isolated nucleic acid molecule comprising a nucleic acid having a nucleotide sequence which exhibits at least 
65% sequence identity to a gene comprising 

(a) a nucleotide sequence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof; or 
40 (b) a complement of a nucleotide sequence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof. 

4. An isolated nucleic acid molecule which is the reverse of the isolated nucleotide sequence according to any one 
of claims 1 -3, such that the reverse nucleotide sequence has a sequence order which is the reverse of the sequence 
order of said isolated nucleotide sequence according to any one of claims 1-3. 

AS 

5. An isolated nucleic acid molecule comprising a nucleic acid capable of hybridizing to a nucleic acid having a 
sequence selected from the group consisting of: 

(a) a nucleotide sequence which is shown in REF and/or SEQ Table 1 or 2; and 
so (b) a nucleotide sequence which is complementary to a nucleotide sequence shown in REF and/or SEQ Table 

1 or 2; 

^ under conditions that permit formation of a nucleic acid duplex at a temperature from about 40°C and 48°C below 
" the melting temperature of the nucleic acid duplex. 


55 


The nucleic acid molecule according to any one of claims 1 -5, wherein said nucleic acid comprises an open reading 
frame. 
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7. The isolated nucleic acid molecule of any one of claims 1 -5, wherein said nucleic acid is capable of functioning as 
a promoter, a 3* end termination sequence, an untranslated region (UTR), or as a regulatory sequence. 

8. The isolated nucleic acid molecule of claim 7, wherein said nucleic acid is a promoter and comprises a sequence 
5 selected from the group consisting of a TATA box sequence, a CAAT box sequence, a motif of GCAATCG or any . 

transcriptoin-factor binding sequence, and any combination thereof. 

9. The isolated nucleic acid molecule of claim 7, wherein the nucleic acid sequence is a regulatory sequence which 
is capable of promoting seed-specific expression, embryo-specific expression, ovule-specific expression, tapotum- 

io specific expression or root-specific expression of a sequence or any combination thereof. 

10. A vector construct comprising a nucleic acid molecule according to any one of claims 1-9, wherein said nucleic 
acid molecule is heterologous to any element in said vector construct. 

is 11. A vector construct according to claim 10 comprising: 

(a) a first nucleic acid having a regulatory sequence capable of causing transcription and/or translation; and 

(b) a second nucleic acid having the sequence of said isolated nucleic acid molecule according to any one of 
claims 1-4; 

20 

wherein said first and second nucleic acids are operably linked and wherein said second nucleic acid is heterolo- 
gous to any element in said vector construct. 

12. The vector construct according to claim 11, wherein said first nucleic acid, is native to said second nucleic acid. 

13. The vector construct according to claim 11 , wherein said first nucleic acid is heterologous to said second nucleic 
acid. 

14. A vector construct according to claim 10 comprising: 

(c) a first nucleic acid having having the sequence of said isolated nucleic acid molecule according to claim 
7; and 

(d) a second nucleic acid; 

35 wherein said first and second nucleic acids are operably linked and wherein said first nucleic acid is heterologous 

to any element in said vector construct. 

15. The vector construct according to clnlm 14, whoroln enld first nucleic field Ib native to cm id oocond nuctolc ncld, 

40 16. The vector construct according to claim 14, wherein said first nucleic acid is heterologous to said second nucleic 
acid. 

17. A host cell comprising an isolated nucleic acid molecule according to any one of claims 1-4, wherein said nucleic 
acid molecule is flanked by exogenous sequence. 

45 

18. A host cell comprising a vector construct of any one of claims 10-16. 

19. An isolated polypeptide comprising an amino acid sequence 

so (a) exhibiting at least 40% sequence identity of an amino acid sequence encoded by a soquonco shown In 

REF and/or SEQ Table 1 or 2 or a fragment thereof; and 

(b) capable of exhibiting at least one of the biological activities of the polypeptide encoded by said nucleotide 
seqence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof. 

ss 20. The isolated polypeptide of claim 19, wherein said amino acid sequence exhibits at least 75% sequence identity 
to an amino acid sequenc encoded by a sequence shown in SEQ Table 1 or 2 or a Iragment thereof. I 

21. The isolated polypeptide of claim 19, wherein said amino acid sequence exhibits at least 85% sequence identity 


25 
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to an amino acid sequence encoded by a sequence shown in SEQ Table 1 or 2 or a Iragment thereof. 

22 The isolated polypeptide of claim 19, wherein said amino acid sequence exhibits at least 90% sequence identity 
' lo tm nrnino acid ooquonco oncodod by n soquonco shown in SEQ Table 1 or 2 or a fragment thereof. 

5 

23. An antibody capable of binding the isolated polypeptide of any one of claims 19-22. 

24. A method of introducing an isolated nucleic acid into a host cell comprising: 

t0 (a) providing an isolated nucleic acid molecule according to any one of claims 1-4; and 

(b) contacting said isolated nucleic with said host cell under conditions that permit insertion, of said nucleic 
acid into said host cell. 

25. A method of transforming a host cell which comprises contacting' a host cell with a vector construct according to 
is any one of claims 10-16. 

26. A method of modulating transcription and/or translation of a nucleic acid in a host cell comprising: 

(a) providing the host cell of claim 24 or 25; and 
20 (b) culturing said host cell under conditions that permit transcription or translation. 

27. A molhod for dotocling a nucleic acid in a sample which comprises: 

(a) providing an isolated nucleic acid molecule according to any one of claims 1-5; 
25 (b) contacting said isolated nucleic acid molecule with a sample under conditions which permit a comparison 

of the sequence of said isolated nucleic acid molecule with the sequence of DNA in said sample; and 

(c) analyzing the result of said comparison. 

28 The method according to claim 27, wherein said isolated nucleic acid molecule and said sample are contacted 
30 under conditions which permit the formation of a duplex between complementary nucleic acid sequences. 

29. A plant or cell of a plant which comprises a nucleic acid molecule according to any one of claims 1-4 which is 
exogenous lo said plant or plant cell. 

35 30. A plant or eel! of a plant which comprises a nucleic acid molecule according to any one of claims 1 -4. wherein said 
nucleic acid molecule is heterologous to said plant or said cell of a plant. 

31 . A plant or cell of a plant which has been transformed with a nucleic acid molecule according to any one of claims 1 -4. 

40 32. A plant of cell of a plant which comprises a vector construct according to any one of claims 10-16. 

33. A plant of cell of a plant which has been transformed with a vector construct according to any one of claims 1 0-1 6. 

34. A plant which has been regenerated from a plant cell according to any one of claims 29-33. 
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polypeptides encoded thereby. The DNA molecules are 
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promoter or as a protein coding sequence or as an UTR 
or as a 3* termination sequence, and are also useful in 
controlling the behavior of a gene in the chromosome, 
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in controlling the expression of a gene or as tools for 
genetic mapping, recognizing or isolating identical or re- 
lated DNA fragments, or identification of a particular in- 
dividual organism, or for clustering of a group of organ- 
isms with a common trait. 

°Arabidopsis DNA is used in the present experi- 
ment, but the procedure is a general one. 


Sequence-determined DNA fragments and corresponding polypeptides encoded thereby 
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